招聘

Distinguished Engineer, Cloud Site Reliability Engineering
US, CA, Santa Clara
·
On-site
·
Full-time
·
2w ago
NVIDIA is looking for a Cloud SRE Architect to work in IPP's (Infrastructure, Planning and Process) Cloud Infrastructure Team. IPP is a global organization within NVIDIA. This group works with various other groups within NVIDIA such as Graphics Processors, Mobile Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to their infrastructure needs. These cloud services provide almost half a million automated jobs per day on thousands of servers helping with the efficiency of thousands of NVIDIA's software engineers worldwide. The cloud hosts various machines and devices with operating systems like Windows, Linux, and Android. It supports hardware platforms including NVIDIA GPUs and Tegra Processors. It delivers unified CI/CD solutions and cloud-based software development. Are you passionate about distributed infrastructure and looking for sophisticated, critical issues, ready to build the next generation of cloud services, design creative solutions, mine through data to uncover real problems and fix them?
What you'll be doing:
-
Serve as an SRE Architect part of GPU Private Cloud team used by thousands of NVIDIANs globally for interactive development, centralized CI / CD and QA testing
-
Evaluating, identifying and developing software solutions to optimize critical software development workflows across various organizations within Nvidia.
-
Architecting, Implementing & supporting end-to-end CI/CD system using open-source and Nvidia proprietary software.
-
Customer (NVIDIA Internal development teams) onboarding to Private cloud infrastructure with a good discovery of the use case and available solutions within the cloud
-
Identify performance bottlenecks and optimize the speed and cost efficiency of AI development and testing systems.
-
Leading software development projects and technically direct a team of brilliant engineers and guide them to provide efficient and impactful solutions.
-
Looking for problems within software systems and resolving the issues
-
Craft and implement critical metrics using various analytics methods and dashboards
What we need to see:
-
BS EE/CS or equivalent experience with 18+ years of systems software development including at least 1 year dedicated to developing/exploring AI.
-
Experience of maintaining cloud infrastructure and highly available production environment.
-
Strong programming and software development skills in JAVA, Python, Shell-script along with good understanding of distributed systems and REST APIs.
-
Experience in working with SQL/NoSQL database systems such as MySQL, Cassandra, MongoDB or Elasticsearch.
-
Excellent knowledge and working experience with Docker containers and Virtual Machines.
-
Good background of Cloud technologies like: Open Stack, Docker, Kubernetes, Chef/Puppet, Hadoop/Ceph/Swift Stack, LXC, Git, Perforce, JFrog, Kafka.
-
Ability to work across organizational boundaries effectively to improve alignment and productivity between teams in a multi-national, multi-time-zone corporate environment.
Ways to stand out from the crowd:
-
Depth in AI, Machine Learning and Deep Learning algorithms and techniques.
-
Strong collaborative and interpersonal skills, with a consistent record of guiding and influencing others in dynamic environments.
-
Experience developing large-scale software systems using modular architecture under real-time performance requirements.
-
Background in designing high-performance, scalable software systems with a strong focus on hardware cost optimization.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 320,000 USD - 488,750 USD.
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until April 5, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
总浏览量
0
申请点击数
0
模拟申请者数
0
收藏
0
相似职位

Senior Cloud Platform Engineer
Eli Lilly · US, San Francisco CA

Senior Infrastructure and DevOps Engineer
Intel · US

Senior Infrastructure and DevOps Engineer
Intel · US

SAP Principal Engineer – Systems (R3-R4)
Eli Lilly · US, Indianapolis IN

Senior Endpoint Engineer (Applications)
AES Corporation · US, Indianapolis, IN
关于NVIDIA

NVIDIA
PublicA computing platform company operating at the intersection of graphics, HPC, and AI.
10,001+
员工数
Santa Clara
总部位置
$4.57T
企业估值
评价
4.1
10条评价
工作生活平衡
3.5
薪酬
4.2
企业文化
4.3
职业发展
4.5
管理层
4.0
75%
推荐给朋友
优点
Great culture and supportive environment
Smart colleagues and excellent people
Cutting-edge technology and learning opportunities
缺点
Team-dependent experience and outcomes
Work-life balance issues with long hours
Politics and influence over competence
薪资范围
73个数据点
Junior/L3
Mid/L4
Junior/L3 · Analyst
7份报告
$170,275
年薪总额
基本工资
$130,981
股票
-
奖金
-
$155,480
$234,166
面试经验
7次面试
难度
3.1
/ 5
体验
正面 0%
中性 86%
负面 14%
面试流程
1
Application Review
2
Recruiter Screen
3
Online Assessment
4
Technical Interview
5
System Design Interview
6
Team Review
常见问题
Coding/Algorithm
System Design
Technical Knowledge
Behavioral/STAR
新闻动态
Negotiating NVIDIA's Offer
Base, stock, and sign-on negotiable. Recruiters invested in closing candidates. CEO reviews all 42K employee salaries monthly. Stock growth has made many employees millionaires.
News
·
NaNw ago
NVIDIA Company Reviews
WLB rated 3.9/5 (lowest category). 64% satisfied with WLB but 53% feel burnt out. Compensation rated 4.4-4.5/5. Experience highly team-dependent.
News
·
NaNw ago
NVIDIA Interview Discussions
Technical bar is high with 4-6 rounds. Process takes 4-8 weeks. Expect C++ questions, LeetCode medium, and system design. Difficulty rated 3.16/5.
News
·
NaNw ago
NVIDIA Culture Discussions
Team-dependent experience; sink-or-swim culture that rewards high performers but can be overwhelming. No politics, flat structure, but demanding workload with some teams requiring evening/weekend work.
News
·
NaNw ago