채용

System Development Manager, Cloud compute/gpu/storage server team

Amazon

Cupertino, CA, USA

On-site

Full-time

5d ago

We have two distinct System Development Manager positions open — one leading the storage server team and one leading the AI/ML (GPU-based) accelerator server team. Because the
core responsibilities, technical depth, and leadership expectations overlap significantly across both roles, we are accepting applications through this single posting. During the interview process, we will assess fit for both positions and align candidates to the team where their experience and interests are the strongest match.

We are looking for a forward-thinking technical leader to manage a diverse, cross-functional team of Hardware Design Engineers, Systems Development Engineers, and Technical Program Managers responsible for developing storage or accelerated (AI/ML/GPU) server platforms for AWS.

This is not a role for someone who manages from a distance. You will set the technical vision and architectural direction for next-generation server platforms — making bold bets on where storage or accelerated compute infrastructure needs to go — and then build and lead the team that delivers it. Your success is measured not just by launching hardware, but by driving fast instance adoption because you built the right thing for the customer.

You will own the full lifecycle — design, build, test, deploy to the data center, launch, and fleet health beyond launch. You will lead a team of architects
defining what we build next, an NPI team delivering it through build and test to the data center, and an operations-focused engineering team ensuring it runs reliably at scale long after launch. You will connect these functions into a single, cohesive organization that moves fast and delivers high-quality server platforms that customers want to adopt.

You will work across organizational boundaries — with other AWS service teams to deeply understand customer workloads and translate
that understanding into hardware architecture decisions. You will lead relationships with ODMs and design partners to develop and manufacture your products at scale. When complex technical problems arise — across hardware, firmware, software, thermal, power, or signal integrity — you will have the technical depth to engage meaningfully and the judgment to drive the right trade-offs.

Key job responsibilities

Vision & Architecture
Set the technical vision and multi-generational roadmap for storage or accelerator (AI/ML/GPU-based) server platforms
Make architectural bets that differentiate AWS — anticipating customer needs and industry shifts before they become obvious
Manage a team of hardware architects in defining server platform architectures that optimize for performance, reliability, cost, and speed of customer adoption
Translate deep understanding of customer workloads (storage, AI/ML training, inference) into hardware design decisions
Influence the broader AWS hardware strategy through data, conviction, and results
Design, Build & Test
Own server platform development from architecture through detailed design, prototype, build, and qualification
Manage a team of engineers responsible for design, build and launch of systems
Lead ODM/JDM and design partner relationships, ensuring our requirements for performance, quality, testability, and diagnostics are met
Drive design verification, system validation, and qualification — ensuring platforms meet reliability, performance, and cost targets before deployment
Ensure systems are designed for operational excellence from day one — testability, diagnosability, and serviceability are built in, not bolted on
Deploy, Launch & Fleet Health
Own deployment to the data center, launch readiness, and successful ramp into production
Drive qualification and readiness milestones, removing technical and organizational blockers to get servers into the fleet
Own fleet health beyond launch — your responsibility never ends. Monitor quality, reliability, and customer experience for the life of the platform
Drive toward zero-touch operations — building automation infrastructure that detects, diagnoses, and remediates faults before customer impact
Build predictive failure detection capabilities using telemetry, error trending, and log correlation
Establish and track fleet health metrics (failure rates, MTTD, MTTR, first-time fix rate, predictive accuracy)
Close the loop between field failures and design improvements in next-generation platforms
Team Leadership & Development
Manage and grow a diverse team spanning hardware engineering, systems development, and technical program management
Hire, develop, and retain top talent across multiple engineering disciplines
Create an environment where engineers with fundamentally different expertise (hardware, firmware, software, program management) collaborate effectively and challenge each other
Set clear goals, remove obstacles, and hold the team to high standards on delivery and quality
Coach and develop senior technical leaders — help architects think bigger and help execution-focused engineers see the strategic picture
Cross-Organization Collaboration
Partner with AWS service teams to ensure server platforms meet data path and control path requirements and drive fast adoption
Work with supply chain, manufacturing, and datacenter operations teams to deliver at scale
Influence peer teams and senior leadership on technical direction, investment priorities, and trade-offs
Represent your team's work and roadmap to VP-level and above

About the team
This organization is responsible for designing, building, testing, launching and maintaining a fleet of AI/ML (GPU-based) servers and storage servers for Amazon's web services. Our engineers work with leading-edge technologies, solve challenging problems, influence the industry's roadmaps, and develop unique solutions that are ahead of the pack. We work in an environment that fosters innovation and creativity — we encourage and invest in new directions and ideas that serve our customers better.

The organization comprises Hardware Design Engineers, Systems Development Engineers, and Technical Program Managers, all with the common
goal of delivering the best storage and accelerator server fleet possible to our customers. We are located in Seattle and Cupertino, and we work with ODMs and Design Partners globally.

We own the full lifecycle of our server platforms: design, build, test, deploy to the data center, launch, and fleet health beyond launch. There is no hand-off — we are accountable from first architecture decision through every day the server runs in production.

Basic Qualifications

7+ years of relevant hands-on systems engineering and administrative work in networking, storage systems, operating systems experience
Bachelor's degree in electrical engineering, mechanical engineering or a relevant engineering discipline
3+ years of direct management experience
Experience in server development, e.g. compute, AI/ML, storage, edge servers.
Hands-on experience in designing, developing and operationally supporting high volume enterprise servers.

Preferred Qualifications

Knowledge of data center infrastructure design, operations, or delivery
Experience leading new product introduction (NPI) teams
Experience writing business strategy documents and plans
Experience engaging and influencing senior executives
Experience that includes strong analytical skills, attention to detail, and effective communication abilities, or experience working with customers with a passion for delivering exceptional service
5+ years of direct management experience
Experience in working with CM/OEM/ODM vendors for design development and manufacturing.

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, CA, Cupertino - 191,300.00 - 258,800.00 USD annually
USA, WA, Seattle - 166,300.00 - 225,000.00 USD annually

총 조회수

총 지원 클릭 수

모의 지원자 수

비슷한 채용공고

Director, Software Engineering (Site Reliability Engineering)

Affirm · Remote Canada

Platform Administration Lead with Management Experience

Infosys · Plano, TX

Director, Private Cloud Platform Engineering

Ford · United States, US

Lead Software Engineer -SRE/DevOps -Java/Python

JPMorgan Chase · Westerville, OH, United States, US

Lead Site Reliability Engineer, Electronic Trading Services

JPMorgan Chase · Singapore, SG

Amazon 소개

Amazon

Public

Amazon.com, Inc. is an American multinational technology company engaged in e-commerce, cloud computing, online advertising, digital streaming, and artificial intelligence.

10,001+

직원 수

Seattle

본사 위치

$1.5T

기업 가치

리뷰

2.9

10개 리뷰

워라밸

2.8

보상

3.7

문화

2.5

커리어

2.3

경영진

2.1

35%

친구에게 추천

장점

Good pay and compensation

Strong benefits package

Flexible scheduling options

단점

Poor management and leadership

Limited growth and promotion opportunities

High stress and demanding work environment

연봉 정보

4개 데이터

L2 · Cybersecurity Analyst L2

0개 리포트

$234,132

총 연봉

기본급

$93,653

주식

$117,066

보너스

$23,413

$163,892

$304,372

면접 경험

10개 면접

난이도

3.7

/ 5

소요 기간

21-35주

합격률

20%

경험

긍정 10%

보통 10%

부정 80%

면접 과정

Application Review

Recruiter Screen

Online Assessment

Technical Phone Screen

Onsite/Virtual Loop

Team Matching

Offer

자주 나오는 질문

Coding/Algorithm

System Design

Behavioral/STAR

Leadership Principles

Technical Knowledge

뉴스 & 버즈

Amazon vs. Walmart: This Isn't Even Close - The Motley Fool

The Motley Fool

News

3d ago

'Kevin' Review: Jason Schwartzman, Aubrey Plaza in Amazon Cat Cartoon - The Hollywood Reporter

The Hollywood Reporter

News

3d ago

Amazon's best weekend deals: Apple, Clinique, Yeti and more — save up to 70% - Yahoo

Yahoo

News

3d ago

Amazon Delivery Drones Involve a Perilous 10-Foot Drop. Users Are Posting the Apparent Results - Gizmodo

Gizmodo

News

3d ago