채용
필수 스킬
Python
Linux
NVIDIA pioneered the GPU in 1999 and now leads the world in AI infrastructure, with its GPUs powering modern graphics, accelerated computing, and the most advanced AI systems that drive breakthroughs in generative AI, scientific discovery, autonomous machines, and massive AI‑powered data centers.
We are looking for a highly motivated and creative Staff System Software Engineer who is experienced and passionate about diagnostics for NVIDIA next-generation GPU products. You will lead and contribute to the design, implementation, and integration of these software-based validations into the sophisticated manufacturing flow for NVIDIA GPU products.
What you’ll be doing:
-
Implement and enhance GPU diagnostics covering power, thermal, memory, PCIe, NVLink, and system‑level checks on boards, servers, and racks.
-
Develop stress and validation tests that exercise GPU subsystems and platform components; add clear pass/fail criteria, telemetry, and error codes suitable for automation and failure attribution.
-
Contribute to the integration of diagnostics into L6/L10/L11 factory flows and datacenter workflows, working with senior engineers to define coverage, runtime, and sequencing.
-
Execute and monitor automated regression tests and pipelines on top of orchestration systems.
-
Debug issues in cooperation with hardware, firmware, and other teams (Ops/TE/AE, etc.); root‑cause problems that span HW/FW/SW boundaries.
-
Improve logging and reporting for diagnostics (e.g., structured logs, JSON, exit codes) to make triage simpler.
-
Analyze factory and field data (yields, error trends, intermittency) to identify gaps in diagnostics or weak debug signals and propose targeted test or logging improvements.
-
Help write and maintain technical documentation: test specs, user guides, SOPs, and troubleshooting guides for internal teams and external partners (ODMs/OEMs/CSPs).
What we need to see:
-
BS or MS in Computer Science, Electrical/Computer Engineering, or a related field, with 8+ years of relevant experience in system software, diagnostics, or platform validation.
-
Strong programming skills in C/C++ and Python, plus familiarity with shell scripting for lab tools and automation.
-
Experience developing low‑level system software on Linux (e.g., board bring‑up tools, diagnostics, or drivers), working close to firmware and hardware registers.
-
Solid understanding of server platforms:
x86 and/or ARM server architecture (CPU, memory, PCIe topology, firmware/boot flow).
-
Basic GPU architecture concepts and how GPU accelerators are integrated into servers.
-
Ability to read and interpret HW and system logs and understand system/block diagrams and schematics in order to connect failures back to the underlying architecture.
-
Demonstrated debugging and problem‑solving skills, including use of tools like gdb, perf, tracing frameworks, and vendor debug utilities.
-
Demonstrated technical leadership with a strong commitment to driving projects, investigations, and critical bugs to closure across cross-functional teams.
-
Strong communication skills and a collaborative mindset; comfortable working with globally distributed teams across HW, FW, SW, operations, and manufacturing.
Ways to stand out from the crowd:
-
Hands‑on experience with diagnostic tools in factory or datacenter environments.
-
Direct involvement in defining or improving diagnostic flows or RMA qualification flows for complex hardware products.
-
Proven track record writing clear, high‑coverage test plans and test cases for complex HW/SW systems.
-
Experience using AI‑assisted tools (for example, for log triage, code assistance, data analysis, or test generation) to accelerate diagnostics development and debugging.
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward‑thinking and hardworking people on the planet working for us. If you’re passionate about building and scaling world‑class diagnostics for cutting‑edge GPU servers, we want to hear from you!
총 조회수
3
총 지원 클릭 수
0
모의 지원자 수
0
스크랩
0
비슷한 채용공고

Sr. Principal Systems Engineer - Mission Effectiveness/Ops Analysis
Northrop Grumman · United States-Florida-Melbourne

Staff Systems Engineer (RF Communications)
Northrop Grumman · United States-Ohio-Cincinnati

DCA&A AT IA CPT Lead/ Sr Principal Systems Engineer
Raytheon (RTX) · 2 Locations

Principal Systems Engineer P4
Collins Aerospace (RTX) · US-AZ-TUCSON-9020 ~ 9020 S Rita Rd ~ BLDG 9020

Principal / Sr. Principal Factory Modeling Systems Engineer
Northrop Grumman · Linthicum Heights, MD
NVIDIA 소개

NVIDIA
PublicA computing platform company operating at the intersection of graphics, HPC, and AI.
10,001+
직원 수
Santa Clara
본사 위치
$4.57T
기업 가치
리뷰
4.1
10개 리뷰
워라밸
3.5
보상
4.2
문화
4.3
커리어
4.5
경영진
4.0
75%
친구에게 추천
장점
Great culture and supportive environment
Smart colleagues and excellent people
Cutting-edge technology and learning opportunities
단점
Team-dependent experience and outcomes
Work-life balance issues with long hours
Politics and influence over competence
연봉 정보
47개 데이터
Junior/L3
Mid/L4
Junior/L3 · Analyst
7개 리포트
$170,275
총 연봉
기본급
$130,981
주식
-
보너스
-
$155,480
$234,166
면접 경험
7개 면접
난이도
3.1
/ 5
경험
긍정 0%
보통 86%
부정 14%
면접 과정
1
Application Review
2
Recruiter Screen
3
Online Assessment
4
Technical Interview
5
System Design Interview
6
Team Review
자주 나오는 질문
Coding/Algorithm
System Design
Technical Knowledge
Behavioral/STAR
뉴스 & 버즈
Negotiating NVIDIA's Offer
Base, stock, and sign-on negotiable. Recruiters invested in closing candidates. CEO reviews all 42K employee salaries monthly. Stock growth has made many employees millionaires.
News
·
NaNw ago
NVIDIA Company Reviews
WLB rated 3.9/5 (lowest category). 64% satisfied with WLB but 53% feel burnt out. Compensation rated 4.4-4.5/5. Experience highly team-dependent.
News
·
NaNw ago
NVIDIA Interview Discussions
Technical bar is high with 4-6 rounds. Process takes 4-8 weeks. Expect C++ questions, LeetCode medium, and system design. Difficulty rated 3.16/5.
News
·
NaNw ago
NVIDIA Culture Discussions
Team-dependent experience; sink-or-swim culture that rewards high performers but can be overwhelming. No politics, flat structure, but demanding workload with some teams requiring evening/weekend work.
News
·
NaNw ago