Jobs
Benefits & Perks
•Annual team offsites
•Top Tier compensation with equity
•Parental leave program
•Health, dental, and vision coverage
•Flexible PTO policy
Required Skills
PyTorch
TensorFlow
Apache Spark
We are looking for an AI Test Architect joining E2E Verification group to profile Innovative large scale Distributed training on NVIDIA AI End-to-End solutions in a large scale supercomputing clusters. Provide insights on at-scale system design and tuning mechanisms for large-scale compute runs. You will work with the latest Accelerated Computing and Deep Learning software and hardware platforms, with researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. You will interact with HPC, OS, Switch, HCA, CPU and GPU compute, and systems specialist to architect, develop and bring up large scale performance platforms.
What you’ll be doing:
-
Profiling, benchmarking, and analyzing deep learning models to identify areas for optimization and improvement in terms of performance, efficiency, and accuracy, with a strong emphasis on networking aspects
-
Collaborating closely with data scientists, researchers, development, automation teams to design and implement scalable training pipelines and frameworks that demonstrate large scale high -performance networking capabilities
-
Staying up-to-date with the latest advancements in deep learning algorithms, architectures, NVIDIA GPU technologies, and high-performance networking solutions
-
Optimizing deep learning models for performance, memory usage, and power efficiency while maximizing high-performance networking features on NVIDIA supercomputers
-
Providing insights and recommendations based on the analysis of large-scale training results, specifically focusing on networking bottlenecks and optimizations, to improve model outcomes and achieve business objectives
-
Collaborating with hardware engineers to guide the development and integration of efficient networking solutions for deep learning, including exploring network architecture optimizations and bringing to bear technologies such as RDMA or Infini Band
What we need to see:
-
B.Sc. in Computer Science, Software Engineering, or equivalent experience
-
Strong understanding and practical experience with machine learning algorithms and techniques, with a specialization in deep learning and expertise in high-performance networking
-
8 years of overall experience, with CUDA programming for deep learning frameworks like Tensor Flow, Py Torch, combined with expertise in networking libraries and protocols
-
Ability to profile and optimize deep learning workflows, focusing on networking-related bottlenecks and optimizations, to improve overall performance and efficiency
-
Exceptional analytical and problem-solving skill, with a keen attention to detail, particularly in identifying and resolving networking performance issues
-
Excellent communication and collaboration skills, enabling effective teamwork and cooperation
-
Familiarity with supercomputers, parallel computing, distributed systems, and high- performance networking technologies like RDMA or Infini Band
Ways to stand out from the crowd:
-
Demonstrated experience in successfully profiling and optimizing large-scale deep learning training on NVIDIA supercomputers, with a significant focus on high-performance networking enhancements
-
Experience with distributed deep learning, distributed training frameworks, or large-scale data pipelines enhanced by high-performance networking solutions
-
Expertise in optimizing networking parameters, such as bandwidth, latency, or congestion control, for deep learning workloads
-
Familiarity with NVIDIA's networking technologies, such as Mellanox Infini Band, and their integration with deep learning workflows
-
Strong understanding of high-performance networking protocols and standards and their application to deep learning
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you! NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
Total Views
0
Apply Clicks
0
Mock Applicants
0
Scraps
0
Similar Jobs
About NVIDIA

NVIDIA
PublicA computing platform company operating at the intersection of graphics, HPC, and AI.
10,001+
Employees
Santa Clara
Headquarters
$4.57T
Valuation
Reviews
4.1
10 reviews
Work Life Balance
3.5
Compensation
4.2
Culture
4.3
Career
4.5
Management
4.0
75%
Recommend to a Friend
Pros
Great culture and supportive environment
Smart colleagues and excellent people
Cutting-edge technology and learning opportunities
Cons
Team-dependent experience and outcomes
Work-life balance issues with long hours
Politics and influence over competence
Salary Ranges
47 data points
Junior/L3
Mid/L4
Junior/L3 · Analyst
7 reports
$170,275
total / year
Base
$130,981
Stock
-
Bonus
-
$155,480
$234,166
Interview Experience
7 interviews
Difficulty
3.1
/ 5
Experience
Positive 0%
Neutral 86%
Negative 14%
Interview Process
1
Application Review
2
Recruiter Screen
3
Online Assessment
4
Technical Interview
5
System Design Interview
6
Team Review
Common Questions
Coding/Algorithm
System Design
Technical Knowledge
Behavioral/STAR
News & Buzz
Negotiating NVIDIA's Offer
Base, stock, and sign-on negotiable. Recruiters invested in closing candidates. CEO reviews all 42K employee salaries monthly. Stock growth has made many employees millionaires.
News
·
NaNw ago
NVIDIA Company Reviews
WLB rated 3.9/5 (lowest category). 64% satisfied with WLB but 53% feel burnt out. Compensation rated 4.4-4.5/5. Experience highly team-dependent.
News
·
NaNw ago
NVIDIA Culture Discussions
Team-dependent experience; sink-or-swim culture that rewards high performers but can be overwhelming. No politics, flat structure, but demanding workload with some teams requiring evening/weekend work.
News
·
NaNw ago
NVIDIA Interview Discussions
Technical bar is high with 4-6 rounds. Process takes 4-8 weeks. Expect C++ questions, LeetCode medium, and system design. Difficulty rated 3.16/5.
News
·
NaNw ago



