採用
Required Skills
Python
C/C++
Bash
PowerShell
Linux
Windows
PyTorch
vLLM
Git
Docker
Kubernetes
GitHub Actions
Jenkins
gdb
perf
ftrace
valgrind
WinDbg
ETW
WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems.
Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary.
When you join AMD, you’ll discover the real differentiator is our culture.
We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives.
Join us as we shape the future of AI and beyond.
Together, we advance your career. MTS SOFTWARE SYSTEM DESIGN ENGINEER (AI/ML, GPU, Drivers, Firmware) OVERVIEW We are seeking an experienced and versatile professional with expertise in validation strategy, automation, and quality for AI/ML model serving, GPU software stacks, device drivers, firmware, and cross-platform systems (Linux/Windows).
You will build test frameworks, drive CI quality gates, perform performance and reliability testing, and lead cross-stack triage to ensure robust releases in a rapidly evolving environment. KEY RESPONSIBILITIES: Own end-to-end test strategy for AI/ML workflows (Py Torch, vLLM), GPU runtimes, drivers, and firmware across kernel and user space.
Develop scalable automation frameworks spanning unit, integration, HIL (hardware-in-the-loop), system, and end-to-end tests.
Implement and maintain CI quality gates (GitHub Actions/Workflows, Jenkins), including automated build, test execution, artifact management, reporting, and flake reduction.
Design and execute performance, stress, reliability, soak, and long-haul tests targeting GPU compute, memory, I/O, and serving throughput/latency.
Validate cross-platform compatibility (Linux/Windows), covering driver interfaces, kernel interactions, firmware behavior, and runtime stability.
Create reproducible environments with containers/orchestration; instrument telemetry and observability for data-driven QA.
Apply agentic AI techniques to accelerate test generation, triage, and root cause analysis; integrate intelligent diagnostics into pipelines.
Develop rigorous test cases for low-level features (PCIe, DMA, interrupts, memory management), error handling, recovery, and fault injection.
Define and track quality KPIs (coverage, defect escape rate, MTTR, performance regressions) and drive continuous improvement.
Lead defect triage across hardware, firmware, driver, runtime, and model layers; collaborate with engineering to resolve issues rapidly.
Produce comprehensive documentation: test plans, procedures, fixtures, coverage maps, readiness criteria, and retrospectives. MINIMUM QUALIFICATIONS: 8–12 years in QA/Test for systems software or platform engineering, with at least 4 years focused on GPU software, device drivers, or firmware validation.
Demonstrable ownership of validation for AI/ML pipelines and serving stacks using Py Torch and at least one modern inference framework (e.g., vLLM), including accuracy baselining and performance regression detection.
Proven expertise testing drivers and firmware with hands-on work in: PCIe fundamentals (link training, BARs, MSI/MSI-X), DMA engines, interrupt handling, and memory models.
Failure modes: error injection, recovery paths, power/thermal events, and persistence across reboot/upgrade cycles.
Deep proficiency in Linux (kernel/user space) and practical experience with Windows driver ecosystems; ability to: Read kernel logs and symbols, trace with ftrace/perf/ETW, and perform cross-layer debugging.
Build custom kernels/modules and analyze crash dumps (kdump, Win Dbg).
Strong programming for test automation: Python for framework and orchestration (pytest or equivalent), robust mocking/fixtures, and data-driven test generation. C/C++ for low-level test harnesses, protocol exercisers, and performance micro-benchmarks.
Bash/PowerShell for environment setup, CI scripting, and reproducibility. CI/CD mastery with GitHub Actions/Workflows and/or Jenkins: Design gated pipelines with parallelization, artifact management, flaky test quarantine, and automated rollback criteria.
Integrate metrics, alerts, and quality reports; enforce go/no-go release thresholds.
Performance testing rigor: Methodology for baselining, variance control, and noise isolation; application of statistical techniques (e.g., confidence intervals, A/B comparisons) to detect regressions. GPU-focused profiling and analysis (e.g., perf counters, memory bandwidth, kernel occupancy).
Tooling fluency: gdb, perf, ftrace, valgrind, Win Dbg, ETW; log/trace correlation; containerized test environments (Docker) and familiarity with Kubernetes for distributed tests.
Exploratory testing mindset: Hypothesis-driven investigation, boundary and adversarial testing, fuzzing (protocol/API), chaos/fault injection, and reverse-engineering of interfaces when documentation is limited.
Communication and leadership: Clear, concise defect reporting; ability to drive triage across teams; establish and evangelize quality standards; maintain strong documentation discipline. GOOD TO HAVE: Lab ops for QA: rack mounting, server configuration, BMC/IPMI, BIOS/fw updates, network/storage setup, power/thermal profiling.
Front-end/UI testing experience for internal tools: ReactJS, web UI automation, accessibility and usability checks.
Backend/DB validation: REST/gRPC testing, SQL/NoSQL, schema migrations, data integrity, performance tuning.
Observability: Prometheus/Grafana, Open Telemetry; integrating quality signals and alerts into CI/CD and release gates. EDUCATIONAL QUALIFICATIONS: BS/MS in Computer Science/Computer Engineering, or related discipline.
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.
We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.
This posting is for an existing vacancy.
MTS SOFTWARE SYSTEM DESIGN ENGINEER (AI/ML, GPU, Drivers, Firmware) OVERVIEW We are seeking an experienced and versatile professional with expertise in validation strategy, automation, and quality for AI/ML model serving, GPU software stacks, device drivers, firmware, and cross-platform systems (Linux/Windows).
You will build test frameworks, drive CI quality gates, perform performance and reliability testing, and lead cross-stack triage to ensure robust releases in a rapidly evolving environment. KEY RESPONSIBILITIES: Own end-to-end test strategy for AI/ML workflows (Py Torch, vLLM), GPU runtimes, drivers, and firmware across kernel and user space.
Develop scalable automation frameworks spanning unit, integration, HIL (hardware-in-the-loop), system, and end-to-end tests.
Implement and maintain CI quality gates (GitHub Actions/Workflows, Jenkins), including automated build, test execution, artifact management, reporting, and flake reduction.
Design and execute performance, stress, reliability, soak, and long-haul tests targeting GPU compute, memory, I/O, and serving throughput/latency.
Validate cross-platform compatibility (Linux/Windows), covering driver interfaces, kernel interactions, firmware behavior, and runtime stability.
Create reproducible environments with containers/orchestration; instrument telemetry and observability for data-driven QA.
Apply agentic AI techniques to accelerate test generation, triage, and root cause analysis; integrate intelligent diagnostics into pipelines.
Develop rigorous test cases for low-level features (PCIe, DMA, interrupts, memory management), error handling, recovery, and fault injection.
Define and track quality KPIs (coverage, defect escape rate, MTTR, performance regressions) and drive continuous improvement.
Lead defect triage across hardware, firmware, driver, runtime, and model layers; collaborate with engineering to resolve issues rapidly.
Produce comprehensive documentation: test plans, procedures, fixtures, coverage maps, readiness criteria, and retrospectives. MINIMUM QUALIFICATIONS: 8–12 years in QA/Test for systems software or platform engineering, with at least 4 years focused on GPU software, device drivers, or firmware validation.
Demonstrable ownership of validation for AI/ML pipelines and serving stacks using Py Torch and at least one modern inference framework (e.g., vLLM), including accuracy baselining and performance regression detection.
Proven expertise testing drivers and firmware with hands-on work in: PCIe fundamentals (link training, BARs, MSI/MSI-X), DMA engines, interrupt handling, and memory models.
Failure modes: error injection, recovery paths, power/thermal events, and persistence across reboot/upgrade cycles.
Deep proficiency in Linux (kernel/user space) and practical experience with Windows driver ecosystems; ability to: Read kernel logs and symbols, trace with ftrace/perf/ETW, and perform cross-layer debugging.
Build custom kernels/modules and analyze crash dumps (kdump, Win Dbg).
Strong programming for test automation: Python for framework and orchestration (pytest or equivalent), robust mocking/fixtures, and data-driven test generation. C/C++ for low-level test harnesses, protocol exercisers, and performance micro-benchmarks.
Bash/PowerShell for environment setup, CI scripting, and reproducibility. CI/CD mastery with GitHub Actions/Workflows and/or Jenkins: Design gated pipelines with parallelization, artifact management, flaky test quarantine, and automated rollback criteria.
Integrate metrics, alerts, and quality reports; enforce go/no-go release thresholds.
Performance testing rigor: Methodology for baselining, variance control, and noise isolation; application of statistical techniques (e.g., confidence intervals, A/B comparisons) to detect regressions. GPU-focused profiling and analysis (e.g., perf counters, memory bandwidth, kernel occupancy).
Tooling fluency: gdb, perf, ftrace, valgrind, Win Dbg, ETW; log/trace correlation; containerized test environments (Docker) and familiarity with Kubernetes for distributed tests.
Exploratory testing mindset: Hypothesis-driven investigation, boundary and adversarial testing, fuzzing (protocol/API), chaos/fault injection, and reverse-engineering of interfaces when documentation is limited.
Communication and leadership: Clear, concise defect reporting; ability to drive triage across teams; establish and evangelize quality standards; maintain strong documentation discipline. GOOD TO HAVE: Lab ops for QA: rack mounting, server configuration, BMC/IPMI, BIOS/fw updates, network/storage setup, power/thermal profiling.
Front-end/UI testing experience for internal tools: ReactJS, web UI automation, accessibility and usability checks.
Backend/DB validation: REST/gRPC testing, SQL/NoSQL, schema migrations, data integrity, performance tuning.
Observability: Prometheus/Grafana, Open Telemetry; integrating quality signals and alerts into CI/CD and release gates. EDUCATIONAL QUALIFICATIONS: BS/MS in Computer Science/Computer Engineering, or related discipline.
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.
We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.
This posting is for an existing vacancy.
Total Views
0
Apply Clicks
0
Mock Applicants
0
Scraps
0
Similar Jobs

Senior Developer
HCL Technologies · Hyderabad, India

Software Engineer III - Java AWS
JPMorgan Chase · Hyderabad, Telangana, India, IN

Senior Software Engineer - Java, Officer
State Street · Hyderabad, India

Senior Software Engineer
Wells Fargo · Hyderabad, India

Sr ServiceNow Developer - Product Operations
ServiceNow · Hyderabad
About AMD

AMD
PublicA semiconductor company that designs and develops graphics units, processors, and media solutions
10,001+
Employees
Santa Clara
Headquarters
Reviews
3.5
25 reviews
Work Life Balance
3.2
Compensation
4.1
Culture
3.6
Career
3.4
Management
3.1
65%
Recommend to a Friend
Pros
Good compensation and benefits
Positive work environment
Great management and coworkers
Cons
Poor work life balance
Micromanagement and excessive tracking
Too much pressure and workload
Salary Ranges
6 data points
L2
L3
L4
L5
L6
L2 · Data Analyst L2
0 reports
$76,430
total / year
Base
$30,572
Stock
$38,215
Bonus
$7,643
$53,501
$99,359
Interview Experience
5 interviews
Difficulty
3.6
/ 5
Duration
14-28 weeks
Offer Rate
60%
Experience
Positive 20%
Neutral 20%
Negative 60%
Interview Process
1
Application Review
2
Recruiter Screen
3
Technical Phone Screen
4
Technical Interview
5
Hiring Manager Interview
6
Offer
Common Questions
Coding/Algorithm
Technical Knowledge
Behavioral/STAR
Past Experience
System Design
News & Buzz
Nvidia vs. AMD vs. Broadcom: What's the Best AI Chip Stock to Own for 2026 - The Globe and Mail
Source: The Globe and Mail
News
·
5w ago
AMD stock rating reiterated at Overweight by Wells Fargo - Investing.com
Source: Investing.com
News
·
5w ago
AMD: Facing Its Moment Of Truth (NASDAQ:AMD) - Seeking Alpha
Source: Seeking Alpha
News
·
5w ago
소재 확 바뀐 2026 그램 AMD vs 인텔 팬서레이크, 둘 다 써보고 결론 냈습니다
News
·
5w ago
·
67,317