採用
Benefits & Perks
•Healthcare
•401(k)
•Equity
•Paid Time Off
•Parental Leave
•Mental Health
•Healthcare
•401k
•Equity
•Parental Leave
•Mental Health
Required Skills
C++
Java
Python
Linux/Unix
Systems design
Software development
Infrastructure automation
Join the EC2 Machine Learning Systems team at Amazon Web Services (AWS) as a System Development Engineer III and lead the development of operational visibility and tooling for EC2 supercomputer instance families. In this role, you'll leverage your specialized knowledge of distributed systems to improve system automation and operational tooling between infrastructure hosting EC2 instances and back-end control plane infrastructure.
This position offers a unique opportunity to work at the intersection of high-performance computing and machine learning infrastructure. You'll apply operations best practices at scale while developing tools and systems that enhance visibility, maintenance, and operations of customer-facing supercomputer instance types. Your work will directly impact how AWS customers leverage compute resources for their most demanding machine learning workloads.
- Key job responsibilities
- Design and implement robust operational visibility solutions and tooling for EC2 supercomputer instance families, focusing on system reliability, performance optimization, and scalability across complex infrastructure
- Lead projects that require collaboration across multiple engineering teams to improve maintenance practices and operational efficiency for customer-facing supercomputer instance types
- Develop technical solutions for complex problems involving Nitro systems, considering multiple risks and roadblocks while keeping solutions as simple as possible
- Build and maintain high-quality systems by adopting best practices, owning operational metrics, and understanding the long-term impact on customer experience
- Balance speed of delivery with foundation for the future, identifying critical technical decisions and advocating for the right solutions that prioritize long-term software quality and maintainability
About the team
The EC2 Nitro Machine Learning Systems team is responsible for development, operations, and maintenance of scale-out machine learning platforms used for training and inference workloads. We build and optimize the infrastructure that powers some of the most computationally intensive AI/ML workloads in the cloud. Our team is passionate about creating reliable, high-performance systems that enable customers to push the boundaries of what's possible with machine learning.
Working with us means having the opportunity to influence the future of supercomputing in the cloud while solving complex technical challenges at massive scale. We collaborate closely with customers and internal teams to continuously improve our platforms and deliver innovations that accelerate machine learning workflows.
Basic Qualifications
- 3+ years of programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby experience
- 4+ years of non-internship professional software development experience
- 2+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience
- 4+ years of systems development in an IT or data center environment experience
- 4+ years of deploying and operating in a Linux/Unix environment experience
- 2+ years of systems design, software development, operations, automation, and process improvement experience
- Experience leading the design, build and deployment of complex and performant (reliable and scalable) software solutions in production
Preferred Qualifications
- 1+ years of development/programming/scripting language (Python/Java/Bash/Perl) experience
- Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management, build processes, testing, certification, and livesite operations
- Experience taking a leading role in building complex software or computing infrastructure that has been successfully delivered to customers
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.
USA, WA, Seattle - 151,200.00 - 204,600.00 USD annually
Total Views
0
Apply Clicks
0
Mock Applicants
0
Scraps
0
Similar Jobs

Principal Research Engineer - Multimodal AI - Microsoft Research AI Frontiers
Microsoft · United States, Washington, Redmond; United States, New York, New York

Senior Software Engineer, Apple Services Engineering (ASE)
Apple · Cupertino, CA

Principal Optical Engineer
Microsoft · United States, Washington, Redmond; United States, California, Mountain View; United States, Texas, Austin; United States, North Carolina, Raleigh; United States, California, San Diego; United States, Idaho, Boise; United States, Oregon, Hillsboro; United States, California, Aliso Viejo

Senior Applications Software Engineer, Planning and Control
NVIDIA · US, CA, Santa Clara

Member of Technical Staff, Data Research Engineer - MAI Superintelligence Team
Microsoft · United Kingdom, London, London; Switzerland, Zürich, Zürich
About Amazon

Amazon
PublicAmazon.com, Inc. is an American multinational technology company engaged in e-commerce, cloud computing, online advertising, digital streaming, and artificial intelligence.
10,001+
Employees
Seattle
Headquarters
Reviews
2.9
10 reviews
Work Life Balance
2.8
Compensation
3.7
Culture
2.5
Career
2.3
Management
2.1
35%
Recommend to a Friend
Pros
Good pay and compensation
Strong benefits package
Flexible scheduling options
Cons
Poor management and leadership
Limited growth and promotion opportunities
High stress and demanding work environment
Salary Ranges
2 data points
L2
L3
L4
L5
L6
L2 · Data Analyst L2
0 reports
$108,330
total / year
Base
$43,332
Stock
$54,165
Bonus
$10,833
$75,831
$140,829
Interview Experience
10 interviews
Difficulty
3.7
/ 5
Duration
21-35 weeks
Offer Rate
20%
Experience
Positive 10%
Neutral 10%
Negative 80%
Interview Process
1
Application Review
2
Recruiter Screen
3
Online Assessment
4
Technical Phone Screen
5
Onsite/Virtual Loop
6
Team Matching
7
Offer
Common Questions
Coding/Algorithm
System Design
Behavioral/STAR
Leadership Principles
Technical Knowledge
News & Buzz
Life on Fire Announces Amazon Bestseller Milestone for “The Wisdom Collective” - Yahoo Finance
Source: Yahoo Finance
News
·
4w ago
Amazon shuts down controversial payment method - AL.com
Source: AL.com
News
·
4w ago
Amazon Prime members can score these bestselling wireless earbuds for only $20 - thestreet.com
Source: thestreet.com
News
·
4w ago
After lawsuit one of the biggest Amazon customers Perplexity signs $750 million deal with Microsoft, says - Times of India
Source: Times of India
News
·
5w ago