채용

Sr. Engineer - Performance AI/ML Deployment Engineering

Sr. Engineer - Performance AI/ML Deployment Engineering

AMD

Santa Clara, California

On-site

Full-time

1w ago

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.THE TEAM:

AMD's Data Center GPU organization is transforming the AI and HPC landscape. Our mission is to design and market exceptional products—anchored by our Instinct™ GPU portfolio—that power the next generation of computing in enterprise data centers, cloud, and supercomputing environments. If you’re excited by AI disruption and want to be part of building something big, join us.

THE ROLE:

The Senior/Principal Engineer DC GPU AI/ML Advanced Forward Deployment and Systems Engineering is a leadership position designed to optimize the design, roll-out and post-rollout management of AI/ML Fabrics. The candidate will be the technical interface between the customers and various internal engineering groups, field application engineers Leveraging extensive experience in large network architecture, Storage, AI/ML network deployments, and performance tuning, this role requires a disciplined approach to system triage, at-scale debug, and infrastructure optimization to ensure robust performance and efficient transitions from GPU production qualification to at-scale datacenter deployment.

THE PERSON:

This position is for a Senior/Principal Engineer DC GPU AI/ML Advanced Forward Deployment and Systems Engineering s Engineering with a focus on architecture, design, optimizing the compute, network, and storage and benchmarking the Machine Learning applications. You will be part of a team closely work with strategic customers and partners to enable large scale deployment of AMD CPU and GPU platforms. You will closely interface with ROCm software developers, DC GPU HW/FW/ASIC Teams, Field Engineering Teams, OEM/ODM partners, CSPs, and Marketing/Business Development teams. Must be self-motivated and possess the ability to work well within a team environment.

KEY RESPONSIBILITIES:

Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models.
Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability.
Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads.
Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations.
Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins.
Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement.
Engage with AMD product groups to drive resolution of application and customer issues
Develop and present training materials to internal audiences, at customer venues, and at industry conferences

PREFERRED EXPERIENCE:

Expertise in performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements.
Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains , namely compute, network, storage.
Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
Deep experience in working with large customers such as Cloud Service Providers and global enterprise customers
Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc.
Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
Extensive experience in Python, Linux, Kernel modules, Application libraries, unless accompanied by other skill sets in the space.
Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends.
Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista Experience is required.
Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market.
Excellent communication level from engineer to mid-management to C-level of audience.
This is a Senior level role; no recent college graduates will be considered.ACADEMIC CREDENTIALS:
Bachelors, master's in computer science , Engineering or related subjects of experience
Ability to work well in a geographically dispersed team.
Certifications in Networking, AI/ML, or Cloud Technologies.

LOCATION:

Santa Clara, CA

This role is not eligible for visa sponsorship.

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

This posting is for an existing vacancy.

THE TEAM:

THE ROLE:

THE PERSON:

KEY RESPONSIBILITIES:

Collaborate with strategic customers on scalable designs involving compute, networking, storage environment, work with industry partners, Internal teams to accelerate the deployment, adoption of various AI/ML models.
Engage system-level triage and at-scale debug of complex issues across hardware, firmware, and software, ensuring rapid resolution and system reliability.
Drive the ramp of Instinct-based large scale AI datacenter infrastructure based on NPI base platform hardware with ROCm, scaling up to pod and cluster level, leveraging the best in network architecture for AI/ML workloads.
Enhance tools and methodologies for large-scale deployments to meet customer uptime goals and exceed performance expectations.
Engage with clients to deeply understand their technical needs, ensuring their satisfaction with tailored solutions that leverage your past experience in strategic customer engagements and architectural wins.
Provide domain specific knowledge to other groups at AMD, share the lessons learnt to drive continuous improvement.
Engage with AMD product groups to drive resolution of application and customer issues
Develop and present training materials to internal audiences, at customer venues, and at industry conferences

PREFERRED EXPERIENCE:

Expertise in performance optimization for large-scale AI/ML networks, including network, compute, storage cluster design, modelling, analytics, performance tuning, convergence, scalability improvements.
Prefer candidates with solid, hands-on expertise in at least one or more of 3 domains , namely compute, network, storage.
Demonstrated leadership in network architecture, hands on experience in RoCEv2 Design, VXLAN-EVPN, BGP, and Lossless Fabrics
Deep experience in working with large customers such as Cloud Service Providers and global enterprise customers
Proven leadership in engaging customers with diverse technical disciplines in avenues such as Proof of Concept, Competitive evaluations, Early Field Trials etc.
Direct experience in working with large customers and can operate with sense of urgency, own the problems and resolve it
Extensive experience in Python, Linux, Kernel modules, Application libraries, unless accompanied by other skill sets in the space.
Proven ability to influence design and technology roadmaps, leveraging a deep understanding of datacenter products and market trends.
Extensive hands-on Network deployment expertise and proven track record of delivering large projects on time. Cisco, Juniper or Arista Experience is required.
Direct, co-development/deployment experience in working with strategic customers/partners in bringing solutions to market.
Excellent communication level from engineer to mid-management to C-level of audience.
This is a Senior level role; no recent college graduates will be considered.ACADEMIC CREDENTIALS:
Bachelors, master's in computer science , Engineering or related subjects of experience
Ability to work well in a geographically dispersed team.
Certifications in Networking, AI/ML, or Cloud Technologies.

LOCATION:

Santa Clara, CA

This role is not eligible for visa sponsorship.

Benefits offered are described: AMD benefits at a glance.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

This posting is for an existing vacancy.

총 조회수

총 지원 클릭 수

모의 지원자 수

비슷한 채용공고

Senior Staff AI Engineer - Enterprise AI Santa Clara, CA 02/12/2026

Palo Alto Networks · santa clara

Senior AI Research Scientist, Robotics Digital Twins

NVIDIA · US, CA, Santa Clara

AIML - Senior Machine Learning Infrastructure Engineer -ML Compute, ML Platform & Technology

Apple · Santa Clara, CA

AIML - Sr. ML Engineer, NL Response Generation - Answers, Knowledge, & Information (AKI)

Apple · Santa Clara, CA

Staff/Sr Staff AI Engineer Scientist Santa Clara, CA 01/26/2026

Palo Alto Networks · santa clara

AMD 소개

AMD

Public

Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company headquartered in Santa Clara, California.

10,001+

직원 수

Santa Clara

본사 위치

$240B

기업 가치

리뷰

3.7

10개 리뷰

워라밸

2.8

보상

3.2

문화

4.1

커리어

3.4

경영진

3.8

68%

친구에게 추천

장점

Great team culture and spirit

Innovative projects and cutting-edge technology

Supportive management and leadership

단점

High workload and overwhelming work demands

Work-life balance challenges

High pressure and stressful deadlines

연봉 정보

6개 데이터

L2 · Data Scientist L2

0개 리포트

$104,000

총 연봉

기본급

$41,600

주식

$52,000

보너스

$10,400

$72,800

$135,200

면접 경험

2개 면접

난이도

3.0

/ 5

소요 기간

14-28주

합격률

50%

면접 과정

Application Review

Recruiter Screen

Hiring Manager Interview

Technical Interview

Offer

자주 나오는 질문

Technical Knowledge

Behavioral/STAR

Past Experience

Problem Solving

뉴스 & 버즈

I Tested Qualcomm's Snapdragon X2 Elite Extreme: This 18-Core Power CPU Hits Hard Against AMD, Apple, Intel - PCMag

PCMag

News

2d ago

Broadcom vs. AMD: Which AI Chipmaker Is the Better Buy? - The Motley Fool

The Motley Fool

News

2d ago

NVIDIA Vs. AMD: Buy The Dominant Leader At A Discount (NASDAQ:NVDA) - Seeking Alpha

Seeking Alpha

News

2d ago

AMD Stock Slips Despite Ryzen 7 5800X3D Return Rumors - TipRanks

TipRanks

News

3d ago