採用
Compensation
$152,000 - $287,500
Benefits & Perks
•Parental leave
•Team events and activities
•Competitive salary and equity package
•Professional development budget
•Comprehensive health, dental, and vision insurance
•Parental Leave
•Equity
•Learning
•Healthcare
Required Skills
Python
JavaScript
TypeScript
About the Role
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. We are looking for a motivated Deep Learning engineer to bring advanced communication technologies into AI stacks, including Py Torch, TRT-LLM, vLLM, SGLang, JAX, etc. You will be working with the team that created communication libraries like NCCL, NVSHMEM & technology like GPUDirect -- for scaling Deep Learning and HPC applications. Your customers will have diverse multi-GPU demands, ranging from training on scales up to 100K GPUs to inference down at microsecond latency. Communication performance between the GPUs has a direct impact on AI applications. Your work in AI toolkits will make all of those easier for the community. This is an outstanding opportunity for someone with an AI background to advance the state of the art in this space. Are you ready to contribute to the development of innovative technologies and help realize NVIDIA's vision?
Responsibilities
- Integrate new communication libraries features in AI frameworks: from PoC to performance analysis to production
- Perform deep analysis of AI workloads and frameworks to identify multi-GPU communication requirements and opportunities
- Collaborate hands-on with teams working on the latest AI models
- Improve AI compilers to hide communications or perform automatic fusion
- Conduct in-depth AI workload performance characterization on multi-GPU clusters
- Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads
- Author custom communication or fused compute-communication kernels to showcase ultimate performance on NV platforms
- Influence the roadmap of communication libraries
- NCCL & NVSHMEM
- Collaborate with a very dynamic team across multiple time zones
Qualifications
- B.S, M.S. or PHD in Computer Science, or related field (or equivalent experience) with 5+ software engineering and HPC/AI experience
- Development or integration experience with Deep Learning Frameworks such Py Torch, JAX, and Inference Engines such as TRT-LLM, vLLM, SGLang
- Rapid prototyping and development with Python, C++, CUDA or related DSLs (Triton, cu Te)
- Solid grasp of AI models, parallelisms, and/or compiler technologies (e.g. torch.compile)
- Experience conducting performance benchmarking on AI clusters
- Familiarity with at least one performance profiler toolchain (Py Torch profiler, NVIDIA Nsight Systems)
- Understanding of HPC/AI communication concepts (1-sided v 2-sided communication, elasticity, resiliency, topology discovery, etc)
- Adaptability and passion to learn new areas and tools
- Flexibility to work and communicate effectively across different teams and timezones
Ways to Stand Out from the Crowd
- Experience with parallel programming on at least one communication runtime (NCCL, NVSHMEM, MPI)
- Good understanding of computer system architecture, HW-SW interactions and operating systems principles (aka systems software fundamentals)
- Expertise in one or more of these areas: Training, Distributed inference, MoE, Reinforcement Learning, kernel authoring (on CUDA, Triton, cu Te, etc)
- Experience with programming for compute & communication overlap in distributed runtimes
- Experience with AI compiler pattern matching and lowering
- Solid understanding of memory hierarchy, consistency model, and tensor layout
Compensation and Benefits
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4. You will also be eligible for equity and benefits.
Application Information
Applications for this job will be accepted at least until February 15, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes.
Equal Opportunity
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. NVIDIA is the world leader in accelerated computing. NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world's largest industries and profoundly impacting society. Learn more about NVIDIA.
Total Views
0
Apply Clicks
0
Mock Applicants
0
Scraps
0
Similar Jobs

CVP FDE – AI Software Development
AMD · San Jose

MACROHARD Tutor
xAI · Palo Alto, CA

AI Solutions Manager, Life Sciences R&D Pharmaceutical - Clinical
Deloitte · Atlanta, GA; Austin, TX; Boston, MA; Charlotte, NC; Chicago, IL; Cincinnati, OH; Cleveland, OH; Columbus, OH; Costa Mesa, CA; Dallas, TX; Darien, CT; Denver, CO; Detroit, MI; Houston, TX; Indianapolis, IN; Kansas City, MO; Los Angeles, CA; McLean, VA; Miami, FL; Minneapolis, MN; Morristown, NJ; Nashville, TN; New York, NY; Philadelphia, PA; Pittsburgh, PA; Sacramento, CA; San Diego, CA; San Francisco, CA; San Jose, CA; Seattle, WA; Tampa, FL; Tempe, AZ

AI Training Optimization Engineer
AMD · Beijing

Senior Machine Learning Engineer
Adobe · San Jose
About NVIDIA

NVIDIA
PublicA computing platform company operating at the intersection of graphics, HPC, and AI.
10,001+
Employees
Santa Clara
Headquarters
$4.57T
Valuation
Reviews
4.1
10 reviews
Work Life Balance
3.5
Compensation
4.2
Culture
4.3
Career
4.5
Management
4.0
75%
Recommend to a Friend
Pros
Great culture and supportive environment
Smart colleagues and excellent people
Cutting-edge technology and learning opportunities
Cons
Team-dependent experience and outcomes
Work-life balance issues with long hours
Politics and influence over competence
Salary Ranges
47 data points
L3
L4
L5
L3 · Data Scientist IC2
0 reports
$177,542
total / year
Base
-
Stock
-
Bonus
-
$150,910
$204,174
Interview Experience
7 interviews
Difficulty
3.1
/ 5
Experience
Positive 0%
Neutral 86%
Negative 14%
Interview Process
1
Application Review
2
Recruiter Screen
3
Online Assessment
4
Technical Interview
5
System Design Interview
6
Team Review
Common Questions
Coding/Algorithm
System Design
Technical Knowledge
Behavioral/STAR
News & Buzz
Negotiating NVIDIA's Offer
Base, stock, and sign-on negotiable. Recruiters invested in closing candidates. CEO reviews all 42K employee salaries monthly. Stock growth has made many employees millionaires.
News
·
NaNw ago
NVIDIA Company Reviews
WLB rated 3.9/5 (lowest category). 64% satisfied with WLB but 53% feel burnt out. Compensation rated 4.4-4.5/5. Experience highly team-dependent.
News
·
NaNw ago
NVIDIA Culture Discussions
Team-dependent experience; sink-or-swim culture that rewards high performers but can be overwhelming. No politics, flat structure, but demanding workload with some teams requiring evening/weekend work.
News
·
NaNw ago
NVIDIA Interview Discussions
Technical bar is high with 4-6 rounds. Process takes 4-8 weeks. Expect C++ questions, LeetCode medium, and system design. Difficulty rated 3.16/5.
News
·
NaNw ago