採用

Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams

Amazon

Cupertino, CA, USA

On-site

Full-time

4d ago

As a Cloud Hardware Development Engineer, you will be an end-to-end owner of storage and/or accelerator (AI/ML/GPU) server platforms — from New Product Introduction (NPI) through fleet health in production. You own the full lifecycle: design, development, qualification, launch, and ongoing operational excellence of servers running at scale in the AWS fleet.

You will work closely with internal customers to understand their technical needs and business goals, leveraging your experience with server design and the knowledge of various teams to architect solutions we deploy at scale. To deliver your products, you will work with an interdisciplinary team of component, firmware, power, mechanical, electrical, test, qualification, manufacturing engineers, and lead our ODM (design and manufacturing partners) to bring these servers to the data center. After launch, you own the fleet — monitoring quality, driving reliability improvements, and ensuring servers continue to meet customer requirements throughout their
operational life.

This role demands deep technical curiosity and the willingness to jump in and personally solve the hardest problems. When a complex system failure occurs — whether during NPI qualification or in a production fleet of hundreds of thousands of servers — you roll up your sleeves, dive into the details across hardware, firmware, software, and physical layers, and drive to root cause. You don't wait for someone else to figure it out.

You will own end-to-end system reliability — proactively identifying deficiencies and driving toward zero-touch operations where automation detects, diagnoses, and resolves issues before customer impact. You will decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features, leading delivery yourself and through others in parallel.

This is a fast-paced, intellectually challenging position. You'll work with thought leaders in multiple technology areas, hold high standards for yourself and everyone you work with, and constantly look for ways to improve your products' performance, quality, and cost. We're changing an industry, and we want individuals who are ready for this challenge and want to reach beyond what is possible today.

Key job responsibilities

NPI — New Product Introduction
Own the end-to-end NPI lifecycle for storage and/or accelerator (AI/ML/GPU) server platforms — from architecture definition through design, qualification, manufacturing ramp, and launch
Lead technical solutions for complex server and rack system architectural challenges
Work with ODM/manufacturing partners to develop, validate, and manufacture server products at scale
Develop functional specifications, design verification plans, and test procedures
Drive qualification and readiness milestones, ensuring new platforms meet performance, reliability, and cost targets before fleet deployment
Identify and resolve technical risks early in the development cycle — don't let problems reach production
Fleet Health, Diagnostics & Automation
Own fleet health for the server platforms you launch — reliability doesn't end at ship
Design and implement predictive failure detection systems using telemetry, sensor data, error trending, and log correlation to identify hardware issues before they cause customer impact
Drive toward zero-touch operations — help build detection, diagnoses, and remediation of faults without human intervention
Debug complex system failures in time-sensitive settings — personally diving deep when the problem demands it
Perform root cause analysis correlating across firmware, kernel, driver, thermal, power, and physical layers
Systems Design & Technical Depth
Apply expertise across hardware, software, system design, x86 architecture, processes, and operations (compute, storage, network, GPU)
Design and implement solutions to address system-level issues at large scale
Decompose complex server system problems (testability, reliability, diagnostics) into deliverable tasks and features
Collaborate with hardware, software, manufacturing, supply chain, and product management teams
Cross-Team Collaboration
Work closely with internal customers to ensure new server hardware meets data path and control path requirements
Identify early any potential problems onboarding new servers into customer ecosystems
Collaborate across Hardware Engineering, component, firmware, test, qualification, and integration teams
Partner with datacenter operations to close the loop between field failures and design improvements

A day in the life
Your day-to-day responsibilities include interfacing with internal and external customers to understand product requirements and facilitate system development on top of your server designs. You will learn operational challenges facing our existing fleet with the goal of improving the current customer experience and developing improved systems for future designs. You will work directly with vendors and ODM (manufacture partners) to scale your product. Some days you're reviewing a new platform design with your ODM; other days you're deep in logs and telemetry data chasing a failure mode across the fleet. You thrive
on that range.

Basic Qualifications

Experience in developing functional specifications, design verification plans and functional test procedures
Bachelor's degree or above in electrical engineering, computer engineering, or equivalent
Experience in English-language communication skills, both written and verbal
Experience with design & innovation and research & development
Knowledge of operating systems, hardware, storage, network, security, database administration and cloud infrastructure
Experience in server technologies such as, thermal, mechanical, power, and signal integrity
5+ years of professional work (non-internship) experience

Preferred Qualifications

5+ years of hardware design and validation of components, subsystems and systems experience
Experience in server technologies: board design, high-speed bus design and signal integrity, failure analysis, server components (CPU, GPU, SSDs, memory), BIOS, BMC, and networking
Experience developing and executing test procedures for mechanical or electrical systems/components
Experience working with ODMs/manufacturer through the product development and manufacturing lifecycle
Experience building predictive failure detection or proactive remediation systems at fleet scale
Experience with storage/compute/GPU/accelerator platforms including integration, diagnostics, or performance validation
Familiarity with PCIe topology, NVLink, NVMe, and accelerator interconnects
Experience with large-scale datacenter or cloud environments

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, CA, Cupertino - 157,300.00 - 212,800.00 USD annually
USA, WA, Seattle - 136,000.00 - 184,000.00 USD annually

総閲覧数

応募クリック数

模擬応募者数

スクラップ

類似の求人

Applied AI Engineer - AI Solutions

Snorkel AI · Redwood City, CA (Hybrid); San Francisco, CA (Hybrid)

AI Engineer - FDE (Forward Deployed Engineer) - U.S. Federal Sector

Databricks · Maryland; Virginia; Washington, D.C.

World Model Research Scientist- Physical AI

Kodiak Robotics · Mountain View, CA

Machine Learning Engineer, Discovery Recommendations

Epic Games · Cary,North Carolina,United States

Machine Learning Scientist (All Levels)

Abridge · SF Office

Amazonについて

Amazon

Public

Amazon.com, Inc. is an American multinational technology company engaged in e-commerce, cloud computing, online advertising, digital streaming, and artificial intelligence.

10,001+

従業員数

Seattle

本社所在地

$1.5T

企業価値

レビュー

2.9

10件のレビュー

ワークライフバランス

2.8

報酬

3.7

企業文化

2.5

キャリア

2.3

経営陣

2.1

35%

友人に勧める

良い点

Good pay and compensation

Strong benefits package

Flexible scheduling options

改善点

Poor management and leadership

Limited growth and promotion opportunities

High stress and demanding work environment

給与レンジ

4件のデータ

Junior/L3

Mid/L4

Principal/L7

Senior/L5

Staff/L6

Director

Junior/L3 · Data Scientist L4

0件のレポート

$181,968

年収総額

基本給

ストック

ボーナス

$154,672

$209,264

面接体験

10件の面接

難易度

3.7

/ 5

期間

21-35週間

内定率

20%

体験

ポジティブ 10%

普通 10%

ネガティブ 80%

面接プロセス

Application Review

Recruiter Screen

Online Assessment

Technical Phone Screen

Onsite/Virtual Loop

Team Matching

Offer

よくある質問

Coding/Algorithm

System Design

Behavioral/STAR

Leadership Principles

Technical Knowledge

ニュース＆話題

Amazon vs. Walmart: This Isn't Even Close - The Motley Fool

The Motley Fool

News

2d ago

'Kevin' Review: Jason Schwartzman, Aubrey Plaza in Amazon Cat Cartoon - The Hollywood Reporter

The Hollywood Reporter

News

2d ago

Amazon's best weekend deals: Apple, Clinique, Yeti and more — save up to 70% - Yahoo

Yahoo

News

2d ago

Amazon Delivery Drones Involve a Perilous 10-Foot Drop. Users Are Posting the Apparent Results - Gizmodo

Gizmodo

News

2d ago