採用
Join the EC2 Nitro Machine Learning Systems team to revolutionize accelerated computing in the cloud. We're seeking an exceptional Software Development Engineer to build and optimize the performance measurement infrastructure for some of the most computationally intensive AI/ML workloads on AWS. In this role, you'll establish EC2 as the definitive source for best-known-configurations across diverse ML applications including LLMs, multimodal models, and video generation workloads. Your expertise will directly influence future platform designs by translating performance insights from state of the art research and customer workloads into technical requirements for upcoming accelerated platform launches.
Your impact will extend from low-level systems (CUDA, EFA, firmware) through ML frameworks to serving layers, requiring deep technical knowledge and the ability to communicate complex performance data as actionable business insights. This position offers the unique opportunity to shape the future of machine learning infrastructure at cloud scale while working at the intersection of high-performance computing, distributed systems, and machine learning technologies.
- Key job responsibilities
- Design and build foundational infrastructure for ML performance measurement that scales with business demand and operates as reliable CI/CD systems, ensuring high-quality implementations that balance customer requirements with operational excellence
- Develop comprehensive regression test coverage across all major component releases including frameworks, firmware, drivers, and networking technologies to maintain optimal platform performance
- Collaborate with cross-functional teams to establish EC2 as the definitive source for best-known-configurations across diverse ML applications including LLMs, multimodal models, and MoE architectures
- Document and communicate performance insights to influence future platform designs by translating technical findings from research and customer workloads into actionable recommendations
- Identify and resolve complex performance challenges through systematic analysis of training and inference performance KPIs across accelerated platforms, working directly with customers to improve their ML system efficiency
A day in the life
Your typical day begins with reviewing performance data from overnight benchmark runs across various ML frameworks and hardware configurations. You'll investigate anomalies, collaborate with the team on optimization opportunities, and join design reviews to influence future platform capabilities. You'll balance your time between building measurement infrastructure, analyzing performance trends, and documenting best practices to help customers optimize their workloads.
About the team
EC2 Nitro Machine Learning Systems is responsible for development, operations, and maintenance of ML platforms for training and inference. We build and optimize infrastructure that powers some of the most computationally intensive AI/ML workloads. Our team creates reliable, high-performance systems that enable customers to push the boundaries of what's possible with ML.
Working with us means having the opportunity to influence the future of supercomputing in the cloud while solving complex technical challenges at massive scale. We collaborate closely with customers and internal teams to continuously improve our platforms and deliver innovations that accelerate machine learning workflows.
Basic Qualifications
- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience programming with at least one software programming language
- Knowledge of Machine Learning and LLM fundamentals, including transformer architecture, training/inference lifecycles, and optimization techniques
Preferred Qualifications
- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Bachelor's degree in computer science or equivalent
- Knowledge of ML frameworks including JAX, Py Torch, vLLM, SGLang, Dynamo, TorchXLA, and TensorRT
- Knowledge of machine learning model architecture and inference
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.
USA, WA, Seattle - 143,700.00 - 194,400.00 USD annually
総閲覧数
0
応募クリック数
0
模擬応募者数
0
スクラップ
0
類似の求人

Software Engineer – Backend, Pricing
Opendoor · Seattle, Washington, United States

Java/J2EE Developer
Infosys · Seattle, WA

Security Developer Tools Engineer (Static Analysis)
Apple · Seattle, WA

Backend Software Engineer, Shop Ads
TikTok · Seattle, WA

Software Engineer I, Backend
Pinterest · Seattle, WA, US; Bay Area, CA
Amazonについて

Amazon
PublicAmazon.com, Inc. is an American multinational technology company engaged in e-commerce, cloud computing, online advertising, digital streaming, and artificial intelligence.
10,001+
従業員数
Seattle
本社所在地
$1.5T
企業価値
レビュー
3.4
10件のレビュー
ワークライフバランス
2.3
報酬
4.2
企業文化
3.1
キャリア
3.8
経営陣
2.7
65%
友人に勧める
良い点
Great benefits and competitive compensation
Learning opportunities and career advancement
Good teamwork and colleagues
改善点
High pressure and long hours
Poor work-life balance
Toxic work culture and high turnover
給与レンジ
4件のデータ
L2
L3
L4
L5
L6
L2 · Data Analyst L2
0件のレポート
$108,330
年収総額
基本給
$43,332
ストック
$54,165
ボーナス
$10,833
$75,831
$140,829
面接体験
6件の面接
難易度
4.0
/ 5
期間
21-35週間
体験
ポジティブ 0%
普通 17%
ネガティブ 83%
面接プロセス
1
Application Review
2
Recruiter Screen
3
Online Assessment
4
Technical Phone Screen
5
Technical Interview
6
Onsite/Virtual Interviews
よくある質問
Coding/Algorithm
System Design
Behavioral/STAR
Technical Knowledge
ニュース&話題
X-Energy’s Shares Jump in IPO, Delivering Wins to Amazon and Ken Griffin - WSJ
WSJ
News
·
Today
Amazon loses $150M after drones hit its data centers — and insurance won’t cover their losses. What it means for you - Yahoo Finance
Yahoo Finance
News
·
Today
Martha Stewart's new Amazon line has chic kitchen appliances from $40 - USA Today
USA Today
News
·
Today
‘Gen V’ Not Returning for Season 3 at Amazon - Variety
Variety
News
·
Today