採用

Software Engineer 2--M365 Foundation
China, Jiangsu, Suzhou; China, Shanghai, Shanghai
·
On-site
·
Full-time
·
1w ago
Overview
The Substrate Fleet Health team is engineering the future of cloud reliability and efficiency of managing health and capacity of the Substrate Fleet. We are a high-impact team driving innovation in hardware health, fleet lifecycle management, intelligent repair systems, and proactive capacity optimization to ensure Microsoft’s hyperscale infrastructure operates at peak performance.
Our mission is bold:
- Maximize fleet availability through proactive detection and mitigation of hardware issues.
- Accelerate repair intelligence with AI-driven insights and automation, reducing repair times from hours to seconds.
- Optimize spare machine utilization and capacity forecasting across global datacenters, unlocking millions in cost savings and enabling sustainable growth.
- Enhance fleet lifecycle management by predicting failures, improving component health, and reducing stranded capacity.
We are building next-generation solutions like Repair Box v Next,Fleet Health Copilot,Unified Spare Pool, and Smart Recovery Services—systems that integrate telemetry, predictive analytics, and automation to transform how cloud infrastructure is managed and scaled.
Our culture values:
- Innovation: We challenge the status quo and pioneer AI-driven solutions for hardware health and capacity optimization.
- Collaboration: We work across Substrate, Azure, and vendor ecosystems to solve complex global challenges.
- Ownership: We take pride in delivering resilient, scalable systems that power Microsoft’s cloud.
Joining this team means shaping the backbone of Microsoft’s cloud reliability and capacity strategy. You’ll be part of a group that doesn’t just respond to issues—we anticipate them, solve them, and set new standards for operational excellence.
As a Software Engineer 2, you will play a critical role in building Hardware Health & Repair Intelligence—a transformative initiative focused on predictive hardware health, intelligent repair workflows, and proactive fleet capacity management.
Why Join Us
- Lead high-impact work at the intersection of cloud reliability, AI, and operational excellence.
- Tackle some of the most critical challenges in fleet health and capacity optimization at hyperscale.
- Be part of a mission-driven team that values innovation, collaboration, and bold execution.
- Influence how Microsoft builds and operates its cloud—smarter, faster, and more sustainably.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
- Responsibilities- Lead architecture and design for intelligent repair and fleet optimization systems, including Repairbox Vnext, and Fleet Copilot.
- Drive development of AI-powered telemetry pipelines and automation frameworks for predictive diagnostics and lifecycle management.
- Establish capacity forecasting and spare pool optimization strategies across global datacenters.
- Ensure security, scalability, and operational excellence across all solutions, including live-site readiness and DRI pathways.
- Collaborate with Azure, vendor, and platform teams to align technical solutions with business goals and reliability standards.
Qualifications:
Required Qualifications:
- Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or PythonOR equivalent experience.
Other Requirements:
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred Qualifications:- Master's Degree in Computer Science or related technical field AND 3+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or PythonOR Bachelor's Degree in Computer Science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
- Expertise in distributed systems,cloud infrastructure, and large-scale automation.
- Solid background in AI/ML-driven telemetry, anomaly detection, and predictive analytics.
- Experience with capacity planning, hardware lifecycle management, and hyperscale reliability preferred.
- Excellent communication and collaboration skills.
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
Total Views
0
Apply Clicks
0
Mock Applicants
0
Scraps
0
Similar Jobs

Senior Principal Engineer – AI Applied Research
Intel · US, Oregon, Hillsboro

IC Package Design Engineer
Apple · Compton, CA

Regulatory Compliance Engineer – Codes, Standards and Regulatory Development – APAC region
Tesla · Toa Payoh

Graphics Engineer Intern, Tegra System Software - Summer 2026
NVIDIA · Japan, Tokyo

Manager, Software Engineering, AI Solutions
Warner Bros. Discovery · Atlanta, Georgia, United States of America
About Microsoft
Reviews
3.8
5 reviews
Work Life Balance
4.1
Compensation
4.3
Culture
3.4
Career
3.2
Management
3.0
65%
Recommend to a Friend
Pros
Excellent compensation and benefits package
Four-day workweek with improved work-life balance
Supportive managers and teams
Cons
High-pressure environment causing anxiety
Unprofessional interview processes
Limited creative work opportunities
Salary Ranges
5,571 data points
Junior/L3
Mid/L4
Junior/L3 · Advertising Client Success
2 reports
$163,358
total / year
Base
$141,875
Stock
-
Bonus
-
$163,358
$163,358
Interview Experience
7 interviews
Difficulty
3.7
/ 5
Duration
14-28 weeks
Offer Rate
14%
Experience
Positive 14%
Neutral 29%
Negative 57%
Interview Process
1
Application Review
2
Recruiter Screen
3
Technical Phone Screen
4
Technical Interview
5
Onsite/Virtual Interviews
6
Final Round
7
Offer
Common Questions
Coding/Algorithm
System Design
Behavioral/STAR
Technical Knowledge
Past Experience
News & Buzz
Microsoft loses $400 billion in few hours, what's behind one of the worst stock market days for the compa - Times of India
Source: Times of India
News
·
5w ago
Microsoft Stock Tumbles 12.1% In Worst Day For Company In Years - HuffPost
Source: HuffPost
News
·
5w ago
Microsoft: The 'question' the company needs to answer - Yahoo Finance
Source: Yahoo Finance
News
·
5w ago
AI is a planet-sized bubble — and Microsoft's slump is a taste of the crash to come, tech guru Erik Gordon says - Business Insider
Source: Business Insider
News
·
5w ago