Jobs

Senior DevOps Engineer

NVIDIA

India, Pune

On-site

Full-time

2w ago

Benefits & Perks

•Equity

Required Skills

Kubernetes

Docker

Python

Java

Ansible

Chef

Puppet

Jenkins

MySQL

Elasticsearch

MongoDB

NVIDIA is looking for an outstanding engineering Architect to join its Software Infrastructure and Operations team. The position will be part of a fast-paced crew that develops and maintains sophisticated Kubernetes based development, compute and test environments for a multitude of platforms including Windows and Linux. You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this infrastructure. With your help we would forge the next generation of compute infrastructure multiplying the power of the CPU, GPU and DPU for the age of AI. We need a motivated, hardworking and focused individual who has a real passion for operational excellence, Infrastructure services, and automation.

What you’ll be doing:

Architect the scaling operation in our data centers. Deploy and Support end-to-end container management solution with Kubernetes, Docker, containerd. Design solutions with service discovery, networking, monitoring, logging, scheduling in Kubernetes.
Setup and Manage end to end Compute Infrastructure using PaaS & IaaS services - tools, plugins, nodes, user management, back up, restore, monitoring, etc. Design and develop AI tools needed for automating maintenance of 35000 hosts with only 12 support engineers.
Design and build sophisticated automations and AI powered applications.
Use your depth in algorithms and system software background!
Work in teams to deploy new data center infrastructure.
Plan and implement critical metrics tracking using various data analytics mining methods and dashboards.
Reuse AI techniques to extract useful signals about machines and jobs from the data generated!
Take part in prototyping, crafting and developing cloud infrastructure for Nvidia.

What we need to see:

Strong Kubernetes understanding and background especially on-premises setup and extensive experience with Kubernetes components & subsystems.
Experience of maintaining large scale cloud/on-prim infrastructure applications using Kubernetes, Slurm and Open Stack
Proven programming background in python/Golang/java and/or relevant scripting languages
Excellent debugging and analytical skills and experience in Databases both SQL (MySQL ) and NoSQL (Elastic Search /MongoDB)
Proficient with configuration management tools like Ansible, Chef, Puppet and strong experience with Jenkins and/or other CI systems.
Hands-on experience with VMs, Dockers, Kubernetes Cluster.
Experience with analytics/visualization tools like Kibana, Grafana, Splunk etc. and experience with monitoring systems such as Zabbix and/or Nagios is nice to have
10 years of proven experience
Bachelors or Master's Degree or equivalent experience in CS, Software Engineering, or related field.

Ways to stand out from the crowd:

Previous experience with DevOps/SRE teams
Thrives in a multi-tasking environment with constantly evolving priorities and documents work well
Outstanding collaboration skills across organizational boundaries, experience with using and improving data centers and with computer algorithms and ability to choose the best possible algorithms to meet the scaling challenge
Ability to divide complex problems into simple sub problems and then reuse available solutions to implement most of those
Experience with designing simple systems that can work reliably without needing much support

Total Views

Apply Clicks

Mock Applicants

Scraps

Similar Jobs

Software Engineer III - Python, Athena Platform Engineer

JPMorgan Chase · Mumbai, India

Sr Principal FinOps/DevOps Engineer (Cortex)

Palo Alto Networks · Santa Clara, CA

Senior DevOps Engineer

Apple · Cupertino, CA

Senior Site Reliability Engineer (SRE), Data - Apple Ads

Apple · Cupertino, CA

Senior Site Reliability Engineer

Morgan Stanley · Glasgow, United Kingdom

About NVIDIA

NVIDIA

Public

A computing platform company operating at the intersection of graphics, HPC, and AI.

10,001+

Employees

Santa Clara

Headquarters

$4.57T

Valuation

Reviews

4.1

10 reviews

Work Life Balance

3.5

Compensation

4.2

Culture

4.3

Career

4.5

Management

4.0

75%

Recommend to a Friend

Pros

Great culture and supportive environment

Smart colleagues and excellent people

Cutting-edge technology and learning opportunities

Cons

Team-dependent experience and outcomes

Work-life balance issues with long hours

Politics and influence over competence

Salary Ranges

47 data points

Junior/L3

Mid/L4

Junior/L3 · Analyst

7 reports

$170,275

total / year

Base

$130,981

Stock

Bonus

$155,480

$234,166

Interview Experience

7 interviews

Difficulty

3.1

/ 5

Experience

Positive 0%

Neutral 86%

Negative 14%

Interview Process

Application Review

Recruiter Screen

Online Assessment

Technical Interview

System Design Interview

Team Review

Common Questions

Coding/Algorithm

System Design

Technical Knowledge

Behavioral/STAR

News & Buzz

Negotiating NVIDIA's Offer

Base, stock, and sign-on negotiable. Recruiters invested in closing candidates. CEO reviews all 42K employee salaries monthly. Stock growth has made many employees millionaires.

News

NaNw ago

NVIDIA Company Reviews

WLB rated 3.9/5 (lowest category). 64% satisfied with WLB but 53% feel burnt out. Compensation rated 4.4-4.5/5. Experience highly team-dependent.

News

NaNw ago

NVIDIA Culture Discussions

Team-dependent experience; sink-or-swim culture that rewards high performers but can be overwhelming. No politics, flat structure, but demanding workload with some teams requiring evening/weekend work.

News

NaNw ago

NVIDIA Interview Discussions

Technical bar is high with 4-6 rounds. Process takes 4-8 weeks. Expect C++ questions, LeetCode medium, and system design. Difficulty rated 3.16/5.

News

NaNw ago