招聘
Responsibilities
- Ensure system reliability, stability and performance by maintaining service-level objectives (SLOs) and minimizing downtime and incidents.
- Collaborate with internal teams to assess system health, stability and resilience, providing architectural and design recommendations for reliability.
- Lead incident management and post-incident reviews, diagnosing issues, deploying fixes and implementing preventive measures.
- Drive automation of operational tasks, including deployments, monitoring, scaling and system recovery, to improve efficiency and reduce manual intervention.
- Define and track key performance indicators (KPIs) such as availability, latency and error rates to optimize system performance and inform decision-making.
- Plan and execute chaos engineering experiments to test system resilience and coordinate performance testing for reliability improvements.
- Ensure alignment between service-level indicators (SLIs) and service-level objectives (SLOs) across the product family.
- Develop and maintain product-level runbooks for incident response, collaborating with SRE teams to ensure effective recovery processes.
- Provide leadership in tool selection and best practices for site reliability engineering (SRE), making final decisions on tools, libraries and standards.
- Work closely with development teams to improve software reliability, scalability and resilience by offering feedback on design and architecture.
- Lead troubleshooting and triage efforts during user-impacting incidents, ensuring swift resolution and minimal disruption.
- Participate in special projects and continuous improvement initiatives, supporting long-term reliability and scalability goals.
Qualifications
- Minimum 8 years of related experience, with at least 5 years in software development.
- Bachelor’s degree (B.E./B.Tech) in Computer Science or IT, or Bachelor’s in Computer Applications (BCA) from a recognized institution.
- Expertise in Site Reliability Engineering (SRE), DevOps, and system reliability, ensuring high availability and performance.
- Strong programming and scripting skills in Python, Go, Bash, or Java, with experience in automating operational tasks.
- Proficiency in observability and resiliency tools such as Splunk, Honeycomb, Datadog, Prometheus, or Grafana.
- Hands-on experience with cloud platforms (AWS, Azure, GCP) and containerization/orchestration tools like Kubernetes, Docker, ECS, or Fargate.
- Solid understanding of automation, Infrastructure-as-Code (IaC), and configuration management using Terraform, Ansible, or CloudFormation.
- Experience with CI/CD pipelines, deployment automation, and version control tools like GitHub, Bitbucket, Jenkins, or Bamboo.
- Deep knowledge of incident management, root cause analysis, and post-incident reviews, focusing on continuous improvement
- Experience in mobile platform reliability (Android, iOS), including performance monitoring and optimization is desired.
Special Factors
Sponsorship
Vanguard is not offering visa sponsorship for this position.
About Vanguard
At Vanguard, we don't just have a mission—we're on a mission.
To work for the long-term financial wellbeing of our clients. To lead through product and services that transform our clients' lives. To learn and develop our skills as individuals and as a team. From Malvern to Melbourne, our mission drives us forward and inspires us to be our best.
How We Work
Vanguard has implemented a hybrid working model for the majority of our crew members, designed to capture the benefits of enhanced flexibility while enabling in-person learning, collaboration, and connection. We believe our mission-driven and highly collaborative culture is a critical enabler to support long-term client outcomes and enrich the employee experience.
Total Views
0
Apply Clicks
0
Mock Applicants
0
Scraps
0
Similar Jobs

Manager, DevOps Engineering
Rocket Lab · Chantilly, VA

Director Software Development, AI Models and Research
AMD · San Jose

Site Reliability Engineer, Lead - Data Platforms
Toyota USA · Plano, Texas

Senior Site Reliability Engineer
Workday · 2 Locations

Director - Pre-silicon Emulation / Post Silicon Validation
AMD · Bangalore
About Vanguard
Reviews
3.4
3 reviews
Work Life Balance
2.5
Compensation
3.2
Culture
2.8
Career
3.5
Management
3.0
45%
Recommend to a Friend
Pros
Competitive compensation package with bonuses
Good foundation for career development
Interesting programs aligned with education
Cons
Long commute requirements (2.5 hours)
Mandatory on-site presence multiple days
Pay below industry standards
Salary Ranges
1,532 data points
Junior/L3
Junior/L3 · Client Relationship Associate
529 reports
$60,018
total / year
Base
$55,076
Stock
-
Bonus
$4,942
$46,375
$78,763
Interview Experience
3 interviews
Difficulty
3.0
/ 5
Duration
14-28 weeks
Interview Process
1
Application Review
2
Recruiter/HR Phone Screen
3
Technical/Case Study Round
4
Final Round Interview
5
Offer
Common Questions
Behavioral/STAR
Technical Knowledge
Case Study
Past Experience
Culture Fit
News & Buzz
Vanguard Personalized Indexing Management LLC Sells 10,432 Shares of Owens Corning Inc $OC - MarketBeat
Source: MarketBeat
News
·
5w ago
Vanguard Mining Reports Re-Assay Program for Redonda Copper-Molybdenum Project - TheNewswire
Source: TheNewswire
News
·
5w ago
Why Vanguard says investors should flip the traditional 60/40 portfolio in favor of bonds - Business Insider
Source: Business Insider
News
·
5w ago
3 Vanguard Mutual Funds to Buy for Spectacular Returns - TradingView
Source: TradingView
News
·
5w ago
