
Online travel company
Manager, Reliability Operations
Expedia Group brands power global travel for everyone, everywhere. We design cutting-edge tech to make travel smoother and more memorable, and we create groundbreaking solutions for our partners. Our diverse, vibrant, and welcoming community is essential in driving our success.
Why Join Us?
To shape the future of travel, people must come first. Guided by our Values and Leadership Agreements, we foster an open culture where everyone belongs, differences are celebrated and know that when one of us wins, we all win.
We provide a full benefits package, including exciting travel perks, generous time-off, parental leave, a flexible work model (with some pretty cool offices), and career development resources, all to fuel our employees' passion for travel and ensure a rewarding career journey. We’re building a more open world. Join us.
Role Summary
A people leader who oversees a team of AI-focused automation analysts within Expedia Group’s AI Resiliency Centre (ARC), the central hub for global IT operations, providing always-on monitoring, triage, and remediation across e Commerce and corporate services. This manager builds a piece of the follow the sun capability across hubs, orchestrating human and agentic AI responders to reduce noise, cut Mean Time to Detect/Restore (MTTK/MTTR), and prevent customer-impacting incidents before they occur.
In this role, you will:
-
Lead a 24/7 global reliability operations function that monitors, supports, and improves production systems, ensuring high availability, resilience, and rapid incident response across multiple services and domains.
-
Own and mature incident management practices, including detection, triage, escalation, communication, and post‑incident review processes, driving reduction in mean time to detect (MTTD) and mean time to resolve (MTTR).
-
Partner closely with engineering, SRE, and product teams to define and evolve operational standards, runbooks, and readiness criteria, including system design (LLD), API integration considerations, and data modeling that support reliable operations.
-
Develop and manage observability strategies (monitoring, alerting, logging, and dashboards) to proactively identify reliability risks and drive data‑driven improvements to system stability and performance.
-
Build, coach, and mentor a high‑performing reliability operations team, fostering a culture of continuous improvement, operational excellence, and accountability across multiple technical domains and platforms.
-
Safely integrate and operate AI/ML‑enabled solutions that improve incident detection, noise reduction, capacity forecasting, and operational workflows, including familiarity with AI‑driven systems, tools, or workflows and applying AI/ML concepts to real world products.
Minimum Qualifications:
-
Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience in operating large‑scale, customer‑facing systems.
-
Substantial experience in reliability operations, SRE, production support, or related fields, including leading 24/7 operational teams and owning reliability for multiple services or a broad technical domain.
-
Proven track record implementing and operating incident management, on‑call, and observability practices (monitoring, alerting, logging, dashboards) for distributed systems, including collaboration with engineering on system design (LLD), API integration, and data modeling.
-
Demonstrated ability to use operational and performance data to drive decisions, prioritize reliability improvements, and manage trade‑offs between stability, velocity, and cost at scale.
-
Hands‑on familiarity with AI‑driven or automation‑focused operational tools (for example, intelligent alerting, anomaly detection, or automated remediation) and ability to ensure they are integrated and operated safely in production.
-
Experience with automation tools and at least one programming or scripting language (Python preferred).
-
Experience with monitoring and observability tools such as Datadog, Splunk, Catchpoint, Pager Duty, or similar platforms.
-
Strong incident response mindset, including the ability to analyze outages, identify root causes, and proactively recommend and implement automation-driven solutions to prevent recurrence.
Preferred Qualifications:
-
Experience leading reliability operations for complex, high‑traffic, globally distributed systems, including coordination across multiple engineering and product teams and ownership of multi‑service or multi‑domain reliability outcomes.
-
Demonstrated success defining and evolving operational architectures and runbooks in partnership with engineering, including low‑level system design, API design for operability, and data models that support effective monitoring, alerting, and incident analysis.
-
Strong track record driving operational excellence: improving incident response processes, leading blameless post‑incident reviews, reducing recurring incidents, and implementing long‑term reliability improvements grounded in data.
-
Experience scaling AI‑ or ML‑enabled capabilities within reliability operations, such as intelligent incident triage, predictive capacity and reliability modeling, or AI‑assisted runbooks, with clear governance and safety controls.
-
Depth in using AI‑driven observability or AIOps platforms to correlate signals across logs, metrics, and traces, and to continuously refine alerting and automation strategies that improve reliability outcomes.
Accommodation requests
If you need assistance with any part of the application or recruiting process due to a disability, or other physical or mental health conditions, please reach out to our Recruiting Accommodations Team through the Accommodation Request.
We are proud to be named as a Best Place to Work on Glassdoor in 2024 and be recognized for award-winning culture by organizations like Forbes, TIME, Disability:IN, and others.
Expedia Group's family of brands includes: Brand Expedia®, Hotels.com®, Expedia® Partner Solutions, Vrbo®, trivago®, Orbitz®, Travelocity®, Hotwire®, Wotif®, ebookers®, Cheap Tickets®, Expedia Group™ Media Solutions, Expedia Local Expert®, Car Rentals.com™, and Expedia Cruises™. © 2024 Expedia, Inc. All rights reserved. Trademarks and logos are the property of their respective owners. CST: 2029030-50
Employment opportunities and job offers at Expedia Group will always come from Expedia Group’s Talent Acquisition and hiring teams. Never provide sensitive, personal information to someone unless you’re confident who the recipient is. Expedia Group does not extend job offers via email or any other messaging tools to individuals with whom we have not made prior contact. Our email domain is @expediagroup.com. The official website to find and apply for job openings at Expedia Group is careers.expediagroup.com/jobs.
Expedia is committed to creating an inclusive work environment with a diverse workforce. All qualified applicants will receive consideration for employment without regard to race, religion, gender, sexual orientation, national origin, disability or age.
閲覧数
0
応募クリック
0
Mock Apply
0
スクラップ
0
類似の求人
Expedia Groupについて

Expedia Group
PublicExpedia Group, Inc. is an American travel technology company that owns and operates travel fare aggregators and travel metasearch engines, including Expedia, Hotels.com, Vrbo, Travelocity, Hotwire.com, Orbitz, Ebookers, CheapTickets, CarRentals.com, Expedia Cruises, Wotif, and Trivago.
10,001+
従業員数
Seattle
本社所在地
$6.8B
企業価値
レビュー
10件のレビュー
3.8
10件のレビュー
ワークライフバランス
2.8
報酬
3.7
企業文化
4.2
キャリア
3.3
経営陣
2.5
68%
知人への推奨率
良い点
Supportive team and colleagues
Flexible work arrangements and remote options
Interesting and creative projects
改善点
Work-life balance challenges and long hours
High stress and burnout during peak seasons
Fast-paced and overwhelming environment
給与レンジ
1件のデータ
Intern
Intern · Machine Learning Scientist Intern
1件のレポート
-
年収総額
基本給
-
ストック
-
ボーナス
-
面接レビュー
レビュー6件
難易度
2.8
/ 5
期間
14-28週間
面接プロセス
1
Application Review
2
Recruiter Screen
3
Technical Assessment/Coding Challenge
4
Final Interview
5
Team Matching
6
Offer
よくある質問
Coding/Algorithm
Technical Knowledge
Behavioral/STAR
Past Experience
Culture Fit
最新情報
Travellers are prime targets for advertisers, says Expedia - Travolution
Travolution
News
·
1w ago
Expedia becomes IShowSpeed’s first travel partner as it enters livestreaming - Ad Age
Ad Age
News
·
1w ago
Chapin Davis Inc. Makes New Investment in Expedia Group, Inc. $EXPE - MarketBeat
MarketBeat
News
·
1w ago
Travelers Spend Over $500 on Non-Travel Purchases Per Trip, Expedia Group Study Finds - Hotel News Resource
Hotel News Resource
News
·
1w ago




