
Realize what's possible.
AI Systems Performance Engineer
Please Note: 1. If you are a first time user, please create your candidate login account before you apply for a job. (Click Sign In > Create Account)2. If you already have a Candidate Account, please Sign-In before you apply.
Job Description:
We are seeking a highly talented and experienced Senior AI Fabric Performance Engineer to take on a critical role within our Performance Lab. In this high-impact position, you will drive the performance benchmarking of AI inference, training and storage workloads with focus on our network infrastructure. You will be responsible to generate reports that aid the customers in deployment and marketing team to position the product.
While the AI workloads (inference and training) run on our servers, your primary focus will be optimizing the Ethernet fabric that connects them. You will be responsible for executing rigorous performance benchmarks, isolating complex system bottlenecks, and tuning parameters to achieve maximum throughput and minimum latency. If you possess a deep understanding of Ethernet fabric, machine learning system demands, and Linux environments, and you thrive on solving complex performance puzzles, we want you on our team.
Key Responsibilities
- Benchmarking & Execution:
Install, configure, and run industry-standard AI performance benchmarks, with a strong emphasis on MLPerf (Training and Inference) and NCCL tests.
- Fabric Optimization:
Tune and optimize network parameters, focusing heavily on Ethernet fabric performance, to ensure seamless data flow for distributed AI workloads running on server clusters.
- Deep Debugging:
Identify, isolate, and troubleshoot complex system performance bottlenecks spanning across the Linux OS, server hardware, and Ethernet switches.
- Automation Development:
Design, develop, and implement robust performance testing frameworks and automation tools to streamline continuous benchmarking.
- Cross-Functional Collaboration:
Document test methodologies, communicate performance findings, and provide actionable improvement recommendations to hardware, software, and networking stakeholders.
Required Qualifications
- Education:
Bachelor's / Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related technical field plus 12+ years / 10+ years related industry experience.
- OS Expertise:
Deep familiarity and hands-on experience with Linux operating systems, including system-level performance tuning and troubleshooting.
- Programming Skills:
Strong proficiency in programming and scripting languages, specifically Python and C++.
- AI/ML Knowledge:
Familiarity with modern machine learning frameworks, particularly Py Torch, and a solid understanding of how AI models consume compute and network resources.
- Networking & Fabric:
Proven experience in performance testing and validating Ethernet switch systems.
- Analytical Capabilities:
Extensive experience with performance metrics, profiling, and benchmarking tools. Strong problem-solving skills with a proven ability to diagnose root causes in complex, distributed systems.
Preferred Qualifications (Optional but recommended for a critical role)
-
Experience with RDMA (Remote Direct Memory Access) and RoCEv2 (RDMA over Converged Ethernet).
-
Prior experience building CI/CD pipelines for automated hardware or software performance regression testing.
-
Familiarity with containerization and orchestration tools (Docker, Kubernetes) used in AI deployments.
Additional Job Description: Compensation and Benefits
The annual base salary range for this position is $141,300 - $226,000.
As a valued member of our team, you'll be eligible for a discretionary annual bonus and the opportunity to receive not only a competitive new hire equity grant, but also annual equity awards, connecting your success directly to the company's growth. All subject to relevant plan documents and award agreements.
Broadcom offers a competitive and comprehensive benefits package: Medical, dental and vision plans, 401(K) participation including company matching, Employee Stock Purchase Program (ESPP), Employee Assistance Program (EAP), company paid holidays, paid sick leave and vacation time. The company follows all applicable laws for Paid Family Leave and other leaves of absence.
Broadcom is proud to be an equal opportunity employer. We will consider qualified applicants without regard to race, color, creed, religion, sex, sexual orientation, national origin, citizenship, disability status, medical condition, pregnancy, protected veteran status or any other characteristic protected by federal, state, or local law. We will also consider qualified applicants with arrest and conviction records consistent with local law.If you are located outside USA, please be sure to fill out a home address as this will be used for future correspondence.
閲覧数
0
応募クリック
0
Mock Apply
0
スクラップ
0
類似の求人

Software Engineer III, AI/ML, Google Cloud Platforms

Researcher, Safety & Privacy
OpenAI · San Francisco

Research Scientist - Plasma Light Source Development
KLA · Ann Arbor, MI

Helix AI Engineer, Robot Learning
Figure AI · San Jose, CA

Software Engineer, Machine Learning
Figma · San Francisco, CA • New York, NY • United States
VMwareについて

VMware
AcquiredRealize what's possible.
10,001+
従業員数
Palo Alto
本社所在地
レビュー
10件のレビュー
3.7
10件のレビュー
ワークライフバランス
4.0
報酬
3.8
企業文化
2.5
キャリア
2.8
経営陣
2.2
35%
知人への推奨率
良い点
Good benefits and perks
Great company culture (pre-acquisition)
Work-life balance
改善点
Broadcom acquisition ruined company culture
Poor leadership and management decisions
Limited career growth and learning opportunities
給与レンジ
5件 のデータ
Mid/L4
Senior/L5
Staff/L6
Mid/L4 · Data Scientist
1件のレポート
$165,100
年収総額
基本給
$127,000
ストック
-
ボーナス
-
$165,100
$165,100
面接レビュー
レビュー10件
難易度
3.0
/ 5
期間
14-28週間
内定率
70%
体験
ポジティブ 30%
普通 50%
ネガティブ 20%
面接プロセス
1
Application Review
2
HR/Recruiter Screen
3
Technical Phone Screen
4
Onsite/Virtual Interviews
5
Reference Check
6
Offer
よくある質問
Coding/Algorithm
Technical Knowledge
Behavioral/STAR
System Design
Past Experience
最新情報
VMware vCenter Server bug added to CISA list of exploited vulnerabilities - SC Media
Source: SC Media
google-news
·
14w ago
Broadcom Mum On Reported VMware Security Software Ban In China - CRN Magazine
Source: CRN Magazine
google-news
·
15w ago
Exclusive: Beijing tells Chinese firms to stop using US and Israeli cybersecurity software, sources say - Reuters
Source: Reuters
google-news
·
16w ago
ING Selects VMware Cloud Foundation 9.0 as Strategic Platform for Private Cloud Modernization - Broadcom
Source: Broadcom
google-news
·
22w ago