招聘

Senior Software Engineer, CoreAI Workload Engines
United States, Washington, Redmond; United States, California, Mountain View
·
On-site
·
Full-time
·
1w ago
Overview
The CoreAI Workloads team builds the foundational inference engines and APIs that power largescale AI inference across Azure - from cutting-edge startups to Fortune 500 enterprises and Microsoft Copilots and agents. Our mission is to deliver secure, reliable, and highly efficient GPU inference that enable multitenant AI systems at global scale while maximizing utilization, performance, and developer productivity. We own inference serving and performance of OpenAI and other state of the art large language model (LLM) models and work directly with OpenAI serving some of the largest workloads on the planet with trillions of inferences per day. Our converged AI fabric and engines deliver inference capabilities for all LLMs in Microsoft catalog, including OpenAI, Anthropic, Mistral, Cohere, Llama, and more.
This role sits at the intersection of LLM inference fleets, serving efficiency, rapid experimentation, cloud infrastructure, and systems software—working closely with CoreAI data plane, compute, and partner teams to deliver end-to-end efficiencies and platform capabilities.
In this role, you will have the opportunity to work on multiple levels of the AI software stack, including the fundamental abstractions, programming models, OpenAI and OSS engines runtimes, libraries and application programming interfaces (APIs) to enable large scale inferencing of models.
You will drive production-grade inference serving improvements for OpenAI and open-source models across Azure, including benchmarking, performance measurement, and disciplined experimentation to improve latency, throughput, availability, and cost at scale. You will both (1) make hands-on engine changes and (2) contribute to the experimentation capabilities that make those changes measurable, safe to ship, and repeatable across teams.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.
Responsibilities
As the Senior engineer on the team, your responsibilities include:
-
Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost).
-
Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/Py Torch integration points where relevant), analyze results, and ship improvements behind guardrails.
-
Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely.
-
Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements.
-
Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails.
-
Extend AI infrastructure abstractions to support elastic, heterogeneous inference engines reliably at scale (e.g., dynamic scaling across model families, modalities, and workload classes while maintaining isolation and SLOs).
-
Tune and scale inference engines across NVIDIA GPU generations (A100, H100, H200) for state-of-the-art OpenAI models, focusing on serving efficiency, utilization, and reliability (not hardware bring-up).
-
Partner with networking and storage teams to leverage high-performance interconnects (e.g., RDMA/Infini Band-class fabrics such as RoCE over IB) for distributed inference, without owning low-level kernel/driver enablement.
-
Drive end-to-end features from design through production: observability, diagnostics, performance regression detection, and operational excellence for inference serving.
-
Influence platform architecture and technical direction across teams through design reviews, clear metrics, and technical leadership focused on experimentation velocity and production reliability.
Additional Responsibilities
-
Work across multiple layers of the AI software stack (abstractions, programming models, engine runtimes, libraries, and APIs) to enable large-scale model inference.
-
Benchmark OpenAI and other LLMs for performance across Azure OpenAI Service workload tiers and segments, and translate results into production improvements.
-
Debug, profile, and optimize production inference performance across the stack (abstractions, runtime, scheduling, and serving pipelines) to improve latency, throughput, and utilization.
-
Monitor performance regressions and drive continuous improvements to reduce time-to-deploy and hardware footprint.
-
Collaborate across engineering teams to deliver scalable, production-ready serving efficiency and availability improvements, using experimentation results to guide prioritization and rollout.
-
Build durable engine interfaces that enable fast experimentation and safe shipping of new strategies for class of service (QoS), replica load balancing, KV management (including offload/retrieval), quantization, and sampling (e.g., multi-token prediction and constrained sampling).
Out of Scope (This role does not focus on)
-
Novel hardware bring-up or first-party silicon enablement (e.g., Microsoft chips) or expanded support for non-NVIDIA platforms (e.g., AMD).
-
Low-level kernel, driver, or CUDA optimization as a primary responsibility.
-
Model pre-training, fine-tuning, or model architecture customization.
Qualifications
- Bachelor's Degree in Computer Science or related technical field and 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python, or equivalent experience.
Other Requirements:
-
Proven ability to design and operate large-scale, production inference services with high reliability and performance requirements, and to ship performance improvements safely via disciplined experimentation.
-
Strong skills in performance analysis: benchmarking, profiling, diagnosing regressions, and turning results into concrete engine/runtime changes.
-
Strong problem-solving skills and the ability to debug complex, cross layer systems issues.
-
Demonstrated technical leadership, including mentoring engineers, driving cross-team architectural alignment, and leveraging AI tools and AI-assisted workflows to accelerate engineering velocity and quality.
-
Hands-on experience with Kubernetes (building and operating services on k8s), including debugging production issues and designing platform abstractions (e.g., custom resources/controllers) and scheduling-aware deployments (e.g., node affinity, taints/tolerations, resource requests/limits).
-
Strong collaboration and communication skills, with the ability to work across organizational boundaries.
Preferred Qualifications:
-
Experience optimizing LLM inference in practice (e.g., Py Torch inference, serving runtimes, model execution, or inference orchestration) in production environments.
-
Familiarity with high performance networking and low latency communication stacks.
-
Familiarity with GPU-accelerated inference stacks (e.g., CUDA at the application/runtime level, device plugins, or runtime integration).
-
Experience building or using experimentation systems (A/B, canarying, tiered rollout), including metric definition and comparability for performance and reliability.
-
Familiarity with distributed inference stacks (e.g., NCCL-style collectives, model/tensor parallelism) and performance tradeoffs in large-scale serving.
Impact & Growth:
-
Work on mission critical infrastructure that directly powers largescale AI systems.
-
Influence the future of cloud GPU platforms used by internal and external customers.
-
Collaborate with experts across OS, hardware, networking, and AI platform teams.
-
Opportunity to grow as a technical leader, shaping long term platform strategy.
Software Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
Software Engineering IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
总浏览量
0
申请点击数
0
模拟申请者数
0
收藏
0
相似职位

Sr. Manager, Field Engineering
Databricks · United States

Senior Software Engineer, Cloud Capacity
Temporal · United States, Remote Opportunity

Principal Engineer, Survivability (R4608)
Shield AI · United States

Staff Software Engineer, Infrastructure Foundations
Temporal · United States, Remote Opportunity

Senior Software Engineer - Mobile
Fanatics · United States, US
关于Microsoft

Microsoft
PublicMicrosoft Corporation is an American multinational technology conglomerate headquartered in Redmond, Washington.
10,001+
员工数
Redmond
总部位置
$3000B
企业估值
评价
3.8
5条评价
工作生活平衡
4.1
薪酬
4.3
企业文化
3.4
职业发展
3.2
管理层
3.0
65%
推荐给朋友
优点
Excellent compensation and benefits package
Four-day workweek with improved work-life balance
Supportive managers and teams
缺点
High-pressure environment causing anxiety
Unprofessional interview processes
Limited creative work opportunities
薪资范围
5,620个数据点
Senior/L5
Senior/L5 · Account Management
5份报告
$209,483
年薪总额
基本工资
$181,941
股票
-
奖金
-
$194,895
$209,483
面试经验
1次面试
难度
4.0
/ 5
时长
14-28周
体验
正面 0%
中性 0%
负面 100%
面试流程
1
Application Review
2
Recruiter Screen
3
Technical Phone Screen
4
Onsite/Virtual Interviews
5
Team Matching
6
Offer
常见问题
Coding/Algorithm
System Design
Behavioral/STAR
Technical Knowledge
Culture Fit
新闻动态
Could Microsoft Win The War For Enterprise AI? - Josh Bersin
Josh Bersin
News
·
3d ago
‘Starting In April’—Microsoft Changes Windows Update After 15 Years - Forbes
Forbes
News
·
3d ago
Microsoft is reportedly giving you a ton of Start menu customization options - XDA
XDA
News
·
3d ago
Get Microsoft Office apps on your Mac for under $9 each - Mashable
Mashable
News
·
3d ago