refresh

Trending Companies

Trending

Jobs

JobsNVIDIA

Principal Software Engineer, AIOps

NVIDIA

Principal Software Engineer, AIOps

NVIDIA

2 Locations

·

On-site

·

Full-time

·

2w ago

Benefits & Perks

Equity

Healthcare

401(k)

Equity

Healthcare

401k

Required Skills

Go

C

Rust

Distributed systems

Kubernetes

Data processing

Telemetry systems

NVIDIA is powering the world’s most advanced AI Factories. To ensure their seamless operation, we are building a mission-critical Observability and Prediction platform. This platform is delivered as a dual-delivery model: both as a high-scale SaaS solution and as a robust on-premises deployment for our largest enterprise customers.

We are looking for a Principal Engineer to lead the architectural vision of the platform’s core. In this role, you will be the internal technical authority responsible for building a unified, high-performance engine that processes massive telemetry streams and runs advanced predictive models, regardless of where the infrastructure resides.

What you’ll be doing:

  • Unified Architectural Vision: Lead the design of a flexible, high-scale architecture that supports both multi-tenant SaaS environments and complex on-premises deployments.
  • Operationalizing Predictive Models: Bridge the gap between AI research and production by architecting the framework that runs sophisticated predictive algorithms at scale, ensuring they are robust enough for mission-critical environments.
  • High-Scale Engineering: Design distributed systems to handle the extreme telemetry density of large-scale AI clusters, ensuring efficient data ingestion, processing, and real-time analysis.
  • Cross-Organizational Leadership: Collaborate with networking and infrastructure teams to define the technical standards that enable the AIOps platform to integrate seamlessly with global AI infrastructure.
  • Technical Excellence: Drive the engineering roadmap, mentor senior staff, and serve as the final authority on architectural decisions, ensuring the platform meets the highest standards of reliability and scalability.

What we need to see:

  • Education: B.Sc./M.Sc. in Computer Science, Computer Engineering, or a related technical field.
  • Experience: 12 years of experience in software engineering, with a proven track record of architecting complex, high-scale products delivered via SaaS and/or on-premises enterprise models.
  • Architectural Sovereignty: Deep expertise in building environment-agnostic distributed systems, using technologies like Kubernetes to ensure portability across cloud and private data centers.
  • Core Systems Programming: Expert-level proficiency in languages such as Go, C, or Rust, with a focus on high-performance, concurrent architectures.
  • Data Infrastructure: Extensive experience with high-throughput data processing (e.g., Apache Kafka) and managing large-scale telemetry or time-series data.

Ways to stand out from the crowd:

  • The 0 to 1 Mindset: A proven track record of taking a complex architectural concept from a whiteboard to a stabilized, production-grade platform.
  • A Systems Thinker: You don't just write software; you understand the full stack, from how data moves across the wire to how it’s processed in a distributed cluster.
  • Infrastructure Evangelist: Experience in leading large-scale technical migrations or introducing modern engineering paradigms (like Cloud-Native or Git Ops) into complex, high-stakes environments.
  • Practical Innovation:The ability to simplify complex problems and build internal tools or frameworks that empower other engineering teams to move faster.

Total Views

0

Apply Clicks

0

Mock Applicants

0

Scraps

0

About NVIDIA

NVIDIA

NVIDIA

Public

A computing platform company operating at the intersection of graphics, HPC, and AI.

10,001+

Employees

Santa Clara

Headquarters

$4.57T

Valuation

Reviews

4.1

10 reviews

Work Life Balance

3.5

Compensation

4.2

Culture

4.3

Career

4.5

Management

4.0

75%

Recommend to a Friend

Pros

Great culture and supportive environment

Smart colleagues and excellent people

Cutting-edge technology and learning opportunities

Cons

Team-dependent experience and outcomes

Work-life balance issues with long hours

Politics and influence over competence

Salary Ranges

47 data points

Junior/L3

Mid/L4

Junior/L3 · Analyst

7 reports

$170,275

total / year

Base

$130,981

Stock

-

Bonus

-

$155,480

$234,166

Interview Experience

7 interviews

Difficulty

3.1

/ 5

Experience

Positive 0%

Neutral 86%

Negative 14%

Interview Process

1

Application Review

2

Recruiter Screen

3

Online Assessment

4

Technical Interview

5

System Design Interview

6

Team Review

Common Questions

Coding/Algorithm

System Design

Technical Knowledge

Behavioral/STAR