招聘
SoC Platform Software Engineering Manager, Annapurna Labs Machine Learning Acceleration, AWS

SoC Platform Software Engineering Manager, Annapurna Labs Machine Learning Acceleration, AWS
Cupertino, CA, USA
·
On-site
·
Full-time
·
2w ago
One C++ codebase. Three radically different execution environments. We're looking for an engineering manager who thinks in terms of platforms, abstractions, and portable software architecture — and can lead a team that ships all three.
Our SoC HAL (Hardware Abstraction Layer) team builds the platform software layer for AWS's custom Trainium and Inferentia ML accelerator chips. The HAL is a shared library that boots, configures, and manages every hardware block on the SoC — 270+ instances per chip — and the same source tree compiles and runs on System Verilog DPI for chip verification, QEMU for system emulation, and Carbon OS in microcontrollers within the AWS production fleet. Your platform abstractions are what make this possible, and your APIs are the interface that 100's of engineers across verification, emulation, and production use to interact with the chip.
Tech stack: C++17, CMake, Google Test, Python, System Verilog DPI, SPI, APB/AXI bus protocols, PCIe, UCIe, HBM, PLL, custom IPs
As the SoC Platform Software Manager, you will:
- Manage, coach, and grow a team of 6 engineers — set technical direction, own hiring, and create an environment where strong engineers want to stay
- Own the platform abstraction layer that enables one C++ codebase to compile and run correctly across three target environments with fundamentally different runtime characteristics
- Shape the external API contracts that verification, emulation, and production teams build on — balancing stability for consumers against the need to evolve as new chip generations arrive
- Drive the architecture of our C++ template metaprogramming framework that generates type-safe register interfaces for every hardware block, and our BUTR (Built-in Unit Test for Registers) and HITL (Hardware-in-the-Loop) test infrastructure
- Build and maintain the CI/CD and validation strategy that catches integration issues across all three platforms before they reach customers
- Coordinate across chip architects, RTL designers, verification engineers, validation engineers, and platform software teams — you're the single point of accountability for HAL readiness on every new chip program
- Get into the weeds alongside your team — debug register-level HW/SW interactions, review code, and write code yourself when it matters
Most platform software teams target one OS or one hardware family. We target three execution environments from a single source tree — and our software must be stateless, survive live-updates on running production servers without reboots, and be correct down to individual register bits. A single abstraction leak can break chip verification, stall emulation, or misconfigure millions of servers in AWS's global fleet.
The HAL runs on an external microcontroller running embedded Linux, reaching into the chip over SPI and PCIe. It's stateless by design: the microcontroller can reboot at any time — including during customer workloads — and the HAL must resume managing the SoC by querying hardware state on-demand. Your platform layer is what makes this resilience possible while keeping the complexity invisible to consumers.
The same codebase that runs in pre-silicon simulation months before tape-out is the codebase that runs in production fleet. When the chip comes back from the fab, your team validates that pre-silicon models match real hardware behavior. For Trainium3, our HAL enabled a full ML training workload within 12 hours of first power-on: https://www.aboutamazon.com/news/aws/trainium-3-ultraserver-faster-ai-training-lower-cost
No ML background needed. Your platform software is the foundation that enables ML training across clusters of thousands of interconnected accelerators — you'll work on components like PCIe and HBM, but won't need to understand ML itself.
This role can be based in Cupertino, CA or Austin, TX. The team is split between the two sites.
Basic Qualifications
- 3+ years of engineering team management experience
- 7+ years of professional software development in C or C++, including systems, platform, or infrastructure software
- 4+ years of designing or architecting software systems (platform abstractions, API design, multi-target build systems)
- Experience developing software that interfaces with hardware or runs across multiple execution environments
- Experience designing APIs or abstraction layers consumed by other engineering teams
Preferred Qualifications
- Experience in recruiting, hiring, mentoring/coaching and managing teams of Software Engineers to improve their skills, and make them more effective, product software engineers
- Experience building or maintaining hardware abstraction layers, board support packages, or platform software for SoC, ASIC, or embedded systems
- Experience with multi-platform or cross-compilation build systems (targeting simulation, emulation, and production from a single source tree)
- Familiarity with bus protocols (APB, AXI, PCIe) or memory subsystems (HBM, DDR)
- Experience with C++ template metaprogramming or code generation frameworks
- Experience with pre-silicon software development (simulation, emulation, or virtual platforms)
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.
USA, CA, Cupertino - 212,700.00 - 287,700.00 USD annually
USA, TX, Austin - 184,900.00 - 250,200.00 USD annually
总浏览量
0
申请点击数
0
模拟申请者数
0
收藏
0
相似职位

AI/ML Lead
Wipro · Tampa, United States

Lead AI Engineer (AI Foundations, LLM Core and Agentic AI)
Capital One · 5 Locations

Lead Machine Learning Engineer
ESPN (Disney) · glendale

Machine Learning Engineer - Vice President
JPMorgan Chase · Plano, TX, United States, US

Senior Manager AI Engineer (GenAI Platform Services)
Capital One · 2 Locations
关于Amazon

Amazon
PublicAmazon.com, Inc. is an American multinational technology company engaged in e-commerce, cloud computing, online advertising, digital streaming, and artificial intelligence.
10,001+
员工数
Seattle
总部位置
$1.5T
企业估值
评价
2.9
10条评价
工作生活平衡
2.8
薪酬
3.7
企业文化
2.5
职业发展
2.3
管理层
2.1
35%
推荐给朋友
优点
Good pay and compensation
Strong benefits package
Flexible scheduling options
缺点
Poor management and leadership
Limited growth and promotion opportunities
High stress and demanding work environment
薪资范围
4个数据点
Junior/L3
L2
L3
L4
L5
L6
M3
M4
M5
M6
Mid/L4
Principal/L7
Senior/L5
Staff/L6
Director
Junior/L3 · Data Scientist L4
0份报告
$181,968
年薪总额
基本工资
-
股票
-
奖金
-
$154,672
$209,264
面试经验
10次面试
难度
3.7
/ 5
时长
21-35周
录用率
20%
体验
正面 10%
中性 10%
负面 80%
面试流程
1
Application Review
2
Recruiter Screen
3
Online Assessment
4
Technical Phone Screen
5
Onsite/Virtual Loop
6
Team Matching
7
Offer
常见问题
Coding/Algorithm
System Design
Behavioral/STAR
Leadership Principles
Technical Knowledge
新闻动态
Amazon vs. Walmart: This Isn't Even Close - The Motley Fool
The Motley Fool
News
·
3d ago
'Kevin' Review: Jason Schwartzman, Aubrey Plaza in Amazon Cat Cartoon - The Hollywood Reporter
The Hollywood Reporter
News
·
3d ago
Amazon's best weekend deals: Apple, Clinique, Yeti and more — save up to 70% - Yahoo
Yahoo
News
·
3d ago
Amazon Delivery Drones Involve a Perilous 10-Foot Drop. Users Are Posting the Apparent Results - Gizmodo
Gizmodo
News
·
3d ago