채용
NVIDIA are seeking a passionate and seasoned Senior K8s Expert to join our team, focusing on the infrastructure construction, optimization, and operation of Agentic AI and Agentic Reinforcement Learning (Agentic RL) workloads. You will play a core role in bridging NVIDIA’s cutting-edge accelerated computing technologies with cloud service providers (CSPs) in China, driving the landing and scaling of Agentic AI/RL solutions based on Kubernetes, and empowering our CSP partners to build high-performance, scalable, and secure Agent Infra systems.
What you'll be doing:
-
Work with Sales, BD and CPM team to introduce NVIDIA technologies into assigned accounts and grow business accordingly.
-
Lead the design, development, and optimization of Kubernetes-based infrastructure solutions for Agentic AI and Agentic RL workloads, addressing core challenges including massive concurrent sandbox scheduling, millisecond-level elasticity, secure isolation, and full-scenario interactive environment support.
-
Collaborate closely with NVIDIA’s CSP partners (major cloud service providers in China) to understand their Agentic AI/RL business needs, provide professional K8s technical guidance, and tailor infrastructure solutions that align with NVIDIA’s accelerated computing technologies (such as NVIDIA AI Enterprise, GB200 platform, and NVCF).
-
Optimize Kubernetes clusters to support high-throughput, low-latency Agentic RL training and inference workloads, including resource scheduling strategy optimization, GPU resource management, network and storage performance tuning, and solving bottlenecks in large-scale Pod creation and scheduling.
-
Design and implement Agent Infra core components based on K8s, such as secure sandbox environments, interactive trajectory recording, checkpoint breakpoint replay, and full-link observability tools, to support the end-to-end lifecycle of Agentic AI/RL development and deployment.
-
Work with cross-functional teams (NVIDIA’s R&D, solution architecture, and technical support teams) to promote the integration of K8s with NVIDIA’s software and hardware ecosystem, including NVIDIA Operators, Dynamo, Grove, and KAI Scheduler, to achieve optimal performance of Agentic workloads.
-
Provide technical leadership in K8s and Agentic AI/RL Infra fields, guide junior engineers, and drive the continuous iteration and improvement of infrastructure solutions based on industry best practices and customer feedback.
-
Stay abreast of the latest trends in Kubernetes, Agentic AI, Agentic RL, and cloud-native infrastructure, introduce advanced technologies and solutions into NVIDIA’s CSP ecosystem, and promote technological innovation and standardization.
-
Participate in technical pre-sales support, solution demonstration, and technical training for CSP partners, helping partners master K8s-based Agentic AI/RL Infra construction and operation capabilities.
What we need to see:
-
Bachelor’s degree or above in Computer Science, Software Engineering, Electrical Engineering, or a related field; master’s degree is preferred.
-
10+ years of hands-on experience in Kubernetes development, operation, and optimization, with deep expertise in K8s core components (kube-apiserver, etcd, kube-scheduler, kubelet) and custom resource development (CRD/Operator).
-
Proven experience in building and optimizing infrastructure for AI/ML workloads, with in-depth understanding of Agentic AI and Agentic RL concepts, and practical experience in supporting Agentic RL training or inference workloads on K8s is a strong plus.
-
Proficiency in containerization technologies (Docker, containerd), container network solutions (Calico, Cilium), and storage solutions (Ceph, GlusterFS), with experience in optimizing network and storage performance for high-concurrency AI workloads.
-
Strong experience in GPU resource management on K8s, familiar with NVIDIA GPU Operator, CUDA, and accelerated computing technologies, and able to optimize GPU utilization for Agentic AI/RL workloads.
-
Excellent programming skills, proficient in at least one programming language (Python, Go, C++), with the ability to develop custom K8s controllers, plugins, or automation tools.
-
Deep understanding of cloud-native architecture and best practices, experience in working with major CSPs (Alibaba Cloud, Tencent Cloud, Huawei Cloud, etc.) is highly preferred.
-
Fluent in spoken and written English, able to communicate effectively with global cross-functional teams and read technical documentation in English.
-
Strong problem-solving skills, ability to identify and resolve complex K8s and Agentic AI/RL Infra technical issues independently, and a proactive and result-driven work attitude.
Ways to stand out from the crowd:
-
Experience in building Agentic AI/RL sandbox environments, familiar with sandbox technologies and their integration with K8s.
-
Experience in large-scale data center infrastructure management, with an understanding of the challenges of pulse-type workload scheduling and cost optimization in Agentic RL scenarios.
-
Familiar with Agentic AI frameworks and RL frameworks, able to align K8s infrastructure with framework requirements.
-
Relevant certifications such as CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), or CKS (Certified Kubernetes Security Specialist).
With competitive salaries and a generous benefits package, we are widely considered to be one of the world’s most desirable employers! We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing. If you're a creative and autonomous person with a real passion for technology, we want to hear from you.
총 조회수
0
총 지원 클릭 수
0
모의 지원자 수
0
스크랩
0
비슷한 채용공고

Senior Solution Architect - Personalization Strategist
Contentful · Denver, Colorado, United States

Principal Solution Architect, National Solutions Growth
Leidos · Gaithersburg; Chantilly

Senior Solutions Architect, Frontier AI Startups
Amazon · Boston, MA, USA

Senior Cloud Solution Architect Manager
Microsoft · Korea, Seoul, Seoul

Client Solutions Architect - Payments - Senior Associate
JPMorgan Chase · Tampa, FL, United States, US
NVIDIA 소개

NVIDIA
PublicA computing platform company operating at the intersection of graphics, HPC, and AI.
10,001+
직원 수
Santa Clara
본사 위치
$4.57T
기업 가치
리뷰
4.1
10개 리뷰
워라밸
3.5
보상
4.2
문화
4.3
커리어
4.5
경영진
4.0
75%
친구에게 추천
장점
Great culture and supportive environment
Smart colleagues and excellent people
Cutting-edge technology and learning opportunities
단점
Team-dependent experience and outcomes
Work-life balance issues with long hours
Politics and influence over competence
연봉 정보
73개 데이터
Junior/L3
Mid/L4
Junior/L3 · Analyst
7개 리포트
$170,275
총 연봉
기본급
$130,981
주식
-
보너스
-
$155,480
$234,166
면접 경험
7개 면접
난이도
3.1
/ 5
경험
긍정 0%
보통 86%
부정 14%
면접 과정
1
Application Review
2
Recruiter Screen
3
Online Assessment
4
Technical Interview
5
System Design Interview
6
Team Review
자주 나오는 질문
Coding/Algorithm
System Design
Technical Knowledge
Behavioral/STAR
뉴스 & 버즈
Negotiating NVIDIA's Offer
Base, stock, and sign-on negotiable. Recruiters invested in closing candidates. CEO reviews all 42K employee salaries monthly. Stock growth has made many employees millionaires.
News
·
NaNw ago
NVIDIA Company Reviews
WLB rated 3.9/5 (lowest category). 64% satisfied with WLB but 53% feel burnt out. Compensation rated 4.4-4.5/5. Experience highly team-dependent.
News
·
NaNw ago
NVIDIA Interview Discussions
Technical bar is high with 4-6 rounds. Process takes 4-8 weeks. Expect C++ questions, LeetCode medium, and system design. Difficulty rated 3.16/5.
News
·
NaNw ago
NVIDIA Culture Discussions
Team-dependent experience; sink-or-swim culture that rewards high performers but can be overwhelming. No politics, flat structure, but demanding workload with some teams requiring evening/weekend work.
News
·
NaNw ago