招聘

Principal Software Engineer, CoreAI

Microsoft

United States, Washington, Redmond; United States, California, Mountain View

On-site

Full-time

1mo ago

必备技能

Python

Java

JavaScript

Kubernetes

Azure

Overview:

The CoreAI GPU Infrastructure team builds the foundational accelerated compute platforms that power largescale AI training and inference across Azure. Our mission is to deliver secure, reliable, and highly efficient GPU infrastructure that enables multitenant AI systems at global scale while maximizing utilization, performance, and developer productivity.

This role sits at the intersection of cloud infrastructure, systems software, virtualization, and container platforms, working closely with CoreAI, Azure Infrastructure, OS, Networking, and Hardware teams to deliver end-to-end platform capabilities.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.

Responsibilities:

As the Principal engineer on the team, your responsibilities include:

Design and build GPU accelerated infrastructure for training and inference workloads, spanning bare metal, virtual machines, and containerized environments.
Develop systems for GPU device management, scheduling, isolation, and sharing (e.g., partial GPU allocation, multitenant usage).
Build and operate advanced orchestration and resource governance scenarios using platforms such as AKS, Dynamic Resource Allocation (DRA), and related Kubernetes ecosystem capabilities to enable fair sharing, isolation, and efficient utilization of accelerated resources.
Build and evolve virtualization and container stacks to support modern AI workloads, including secure and confidential compute scenarios.
Optimize performance, reliability, and utilization across large GPU fleets, including scaleup and scale out configurations.
Partner with networking and storage teams to enable high performance interconnects (e.g., RDMA/Infini Band class networking) for distributed workloads.
Drive end-to-end platform features from design through production, including observability, diagnostics, and operational excellence.
Influence platform architecture and technical direction across teams through design reviews and technical leadership.

Qualifications:

Required Qualifications:

Bachelor's Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience.

Other Requirements:

Proven ability to design and operate largescale, production infrastructure with high reliability and performance requirements.
Strong problem-solving skills and the ability to debug complex, cross layer systems issues.
Demonstrated technical leadership, including mentoring engineers and driving cross team architectural alignment.
Hands-on experience with virtualization and/or container platforms (e.g., VMs, Kubernetes, container runtimes).
Strong collaboration and communication skills, with the ability to work across organizational boundaries.

Preferred Qualifications:

Familiarity with distributed training and inference stacks (e.g., NCCL style collectives, model/data parallelism).
Experience in building or operating multitenant AI platforms in cloud environments.
Familiarity with high performance networking and low latency communication stacks.
Familiarity with GPU accelerated computing (e.g., CUDA, GPU drivers, device plugins, or runtime integration).
Familiarity with GPU virtualization, passthrough, or partitioning technologies.
Knowledge of confidential computing, trusted execution environments, or hardware-backed isolation.

Impact & Growth:

Work on mission critical infrastructure that directly powers largescale AI systems.
Influence the future of cloud GPU platforms used by internal and external customers.
Collaborate with experts across OS, hardware, networking, and AI platform teams.
Opportunity to grow as a technical leader, shaping long term platform strategy.

Software Engineering IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay

Software Engineering IC6 - The typical base pay range for this role across the U.S. is USD $163,000 - $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800 - $331,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

总浏览量

申请点击数

模拟申请者数

相似职位

Senior Software Engineer, GitHub Actions

GitHub · United States

Senior AI Software Engineer II

Principal · United States

Principal Engineer, State Estimation (R4602)

Shield AI · United States

Senior Software Engineer, Observability Experience

GitHub · United States

Sr. Manager, Field Engineering

Databricks · United States

关于Microsoft

Microsoft

Public

Microsoft Corporation is an American multinational technology conglomerate headquartered in Redmond, Washington.

10,001+

员工数

Redmond

总部位置

$3000B

企业估值

评价

3.8

5条评价

工作生活平衡

4.1

薪酬

4.3

企业文化

3.4

职业发展

3.2

管理层

3.0

65%

推荐给朋友

优点

Excellent compensation and benefits package

Four-day workweek with improved work-life balance

Supportive managers and teams

缺点

High-pressure environment causing anxiety

Unprofessional interview processes

Limited creative work opportunities

薪资范围

5,620个数据点

Senior/L5

Senior/L5 · Account Management

5份报告

$209,483

年薪总额

基本工资

$181,941

股票

奖金

$194,895

$209,483

面试经验

1次面试

难度

4.0

/ 5

时长

14-28周

体验

正面 0%

中性 0%

负面 100%

面试流程

Application Review

Recruiter Screen

Technical Phone Screen

Onsite/Virtual Interviews

Team Matching

Offer

常见问题

Coding/Algorithm

System Design

Behavioral/STAR

Technical Knowledge

Culture Fit

新闻动态

Could Microsoft Win The War For Enterprise AI? - Josh Bersin

Josh Bersin

News

2d ago

‘Starting In April’—Microsoft Changes Windows Update After 15 Years - Forbes

Forbes

News

2d ago

Microsoft is reportedly giving you a ton of Start menu customization options - XDA

XDA

News

2d ago

Get Microsoft Office apps on your Mac for under $9 each - Mashable

Mashable

News

2d ago