Amazon
Amazon

Software Development Engineer, AWS Resilience, Health Guardian

RoleBackend
LevelMid Level
LocationSeattle, WA, United States
WorkOn-site
TypeFull-time
Posted5 days ago
Apply now

About the role

AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we're the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain, and we're looking for talented people who want to help.

You'll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You'll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you'll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.

The Health Guardian team is looking for a software engineer who is excited about building automated detection and mitigation systems that protect AWS infrastructure at scale. We detect subtle failures that evade traditional health checks and automatically remove affected resources from service before customers are impacted. Our systems run across every AWS region, and we're scaling coverage from hundreds of services to thousands. This is a hands-on position where you will design and deliver significant software components, drive cross-team technical alignment, and mentor other engineers. You need to be a strong software developer with a track record of delivering, but also excel in communication, technical leadership, and customer focus. You'll leverage generative AI tools as part of your daily workflow to accelerate design, development, and validation. This is an opportunity to join a small, high-impact team solving hard reliability problems and help shape both the technology and the direction of automated failure protection across AWS.

Key job responsibilities
Our engineers collaborate across diverse teams, projects, and environments to have a firsthand impact on AWS reliability. You'll bring a passion for distributed systems, safety engineering, and data-driven detection. You'll also: Design and deliver systems that span multiple AWS teams and organizational boundaries. Build detection algorithms and experimentation frameworks that validate changes at scale. Architect safety mechanisms — circuit breakers, throttling, validation — that let automation scale without unintended customer impact. Own ambiguous problems end-to-end from design through operations. Mentor other engineers and lead technical design reviews. Use AI-assisted development tools to prototype, test, and validate faster.

About the team
We are a small team with outsized impact on AWS reliability. We operate what we build, and every engineer has direct visibility into how their code performs during real infrastructure events. We solve complex distributed systems challenges to ensure automated protection works reliably even during the failures it's designed to detect. We value operational rigor, building systems that are safe by default, and solving hard problems with simple designs.

Basic Qualifications

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • 2+ years of programming with at least one software programming language experience

Preferred Qualifications

  • 2+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Bachelor's degree in computer science or equivalent
  • Experience in mentoring, leading, or managing more junior engineers

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, WA, Seattle - 143,700.00 - 194,400.00 USD annually

Benefits and perks

Healthcare

401(k)

Equity

Paid Time Off

Parental Leave

Learning Budget

Home Office Setup

Mental Health Support

Required skills

Software engineering

Distributed systems

Automation

Reliability engineering

Systems design

Debugging

About Amazon

Seattle

Headquarters