
Provides open source software products to enterprises and is a subsidiary of IBM.
Senior Site Reliability Engineer
복지 및 혜택
•의료보험
•401k
•스톡옵션
•원격 근무
•유연 근무제
•육아휴직
•교육비 지원
필수 스킬
Python
Go
Ansible
Kubernetes
Linux
About the Job :
The Red Hat IT Automation & Intelligence Evolution (AIE) team is seeking a senior site reliability engineer to drive our strategic shift from traditional operations to intelligent automation and AIOps. In this pivotal role, you will serve as a technical lead for reliability and a strategic consultant to the wider organization.
You will design and implement self-service platforms, drive AI-driven operational workflows, and spearhead our alert-noise reduction campaigns. You will act as a technical leader, mentoring junior engineers and partnering with internal teams to identify high-ROI automation opportunities. Your goal is not just to resolve issues but to permanently remove the hidden tax of toil and interruptions through engineering and AI adoption.
What wil you do?
Reliability Engineering & Standards:
-
Define and Enforce SLOs: Lead the definition of Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical services, managing Error Budgets to balance feature velocity with system stability.
-
AIOps Implementation: Drive the adoption of AIOps solutions (including Anomaly Detection and Predictive Alerting) to reduce incident volume and improve Mean Time to Resolution (MTTR).
-
Resilience Engineering: Design and lead Chaos Engineering experiments (e.g., fault injection) to validate system recovery and uncover weaknesses before they impact production.
Automation & Efficiency:
-
Eliminate Toil: Identify manual, repetitive work patterns and engineer complex automation solutions to eliminate them, aiming to boost overall team capacity.
-
Intelligent Workflows: Move beyond basic scripting to build intelligent agents and workflows using tools like the Model Context Protocol (MCP) and LLM integrations to automate decision-making processes.
-
Infrastructure as Code: Maintain and evolve the Infrastructure as Code ecosystem, ensuring robust configuration management and version control standards are applied across the environment.
Enablement & Leadership:
-
Internal Consulting: Act as a subject matter expert, engaging with other engineering teams to scope their automation needs and help them build/manage their own workflows.
-
Incident Command: Lead high-severity incident response efforts, serving as the Incident Commander when necessary.
-
Root Cause Analysis: Facilitate blameless post-mortems, focusing on systemic root causes (Graph Algorithm Design, Blast Radius Analysis) rather than human error to prevent recurrence.
-
Mentorship: Mentor junior SREs, conducting code reviews and guiding them through complex troubleshooting and systems engineering principles.
What will you bring?
Technical Competency:
-
Programming: Proficiency in Python or Go, with experience in building modular, scalable software.
-
Automation: Proficiency with Ansible for configuration management, orchestration, and automation workflows.
-
Observability Stack: Expert-level knowledge of monitoring ecosystems, specifically the TIGK Stack (Telegraf, InfluxDB, Grafana, Kapacitor) and Prometheus.
-
Cloud & Containerization: Deep understanding of Linux environments, Kubernetes/Open Shift, and public cloud infrastructure (AWS/Azure/GCP).
SRE Methodology:
-
Demonstrated experience designing and implementing SLIs, SLOs, and Error Budgets.
-
Proven track record of Toil Reduction strategies and implementation.
-
Experience with Incident Management lifecycles (escalation policies, paging, and post-mortems).
Soft Skills:
-
Growth Mindset: Open-minded approach to problem-solving and a demonstrated willingness to learn and adopt new technologies.
-
Strategic Thinking: Ability to translate business goals into technical roadmaps.
-
Communication: Strong ability to explain complex reliability concepts to non-SRE teams and leadership.
The following are considered as a plus:
-
Automation Platforms: Experience with Ansible Automation Platform (AAP) or similar configuration management tools for enterprise-scale environments.
-
AI/LLM Integration: Experience with Model Context Protocol (MCP), Claude Plugin development, or integrating LLMs into operational workflows.
-
Data Science for Ops: Experience with regression data or algorithms for predictive alerting.
-
Security: Experience with hardening systems (Bastion Hosts) and managing security policies within automation workflows.
About Red Hat
Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40 countries, our associates work flexibly across work environments, from in-office, to office-flex, to fully remote, depending on the requirements of their role. Red Hatters are encouraged to bring their best ideas, no matter their title or tenure. We're a leader in open source because of our open and inclusive environment. We hire creative, passionate people ready to contribute their ideas, help solve complex problems, and make an impact.
Inclusion at Red Hat
Red Hat’s culture is built on the open source principles of transparency, collaboration, and inclusion, where the best ideas can come from anywhere and anyone. When this is realized, it empowers people from different backgrounds, perspectives, and experiences to come together to share ideas, challenge the status quo, and drive innovation. Our aspiration is that everyone experiences this culture with equal opportunity and access, and that all voices are not only heard but also celebrated. We hope you will join our celebration, and we welcome and encourage applicants from all the beautiful dimensions that compose our global village.
Equal Opportunity Policy (EEO)
Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.
Red Hat does not seek or accept unsolicited resumes or CVs from recruitment agencies. We are not responsible for, and will not pay, any fees, commissions, or any other payment related to unsolicited resumes or CVs except as required in a written contract between Red Hat and the recruitment agency or party requesting payment of a fee.
Red Hat supports individuals with disabilities and provides reasonable accommodations to job applicants. If you need assistance completing our online job application, email application-assistanceredhat.com. General inquiries, such as those regarding the status of a job application, will not receive a reply.
전체 조회수
0
전체 지원 클릭
0
전체 Mock Apply
0
전체 스크랩
0
비슷한 채용공고

Staff Site Reliability Engineer, Energy Software
Tesla · Richmond Hill, Ontario

Sr. Software Engineer, Manufacturing Quality
Tesla · Fremont, California

Staff Site Reliability Engineer, Energy Software
Tesla · Palo Alto, California

Software Engineer – Golang (m/w/d) - Gigafactory Berlin-Brandenburg
Tesla · Grünheide (mark), Brandenburg

Internship, Software Engineer, Autonomy Telemetry (Summer 2026)
Tesla · Palo Alto, California
Red Hat 소개

Red Hat
AcquiredRed Hat, Inc. is an American software company that provides open source software products to enterprises and is a subsidiary of IBM. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina, with other offices worldwide.
10,001+
직원 수
Raleigh
본사 위치
$34B
기업 가치
리뷰
10개 리뷰
3.9
10개 리뷰
워라밸
3.2
보상
4.0
문화
4.3
커리어
3.5
경영진
3.4
75%
지인 추천률
장점
Great team culture and collaboration
Flexible work arrangements
Excellent benefits and compensation
단점
Heavy workload and stress during peak times
Work-life balance challenges
Fast-paced and overwhelming environment
연봉 정보
993개 데이터
Junior/L3
Mid/L4
Senior/L5
Junior/L3 · Associate Consultant
70개 리포트
$103,140
총 연봉
기본급
$95,867
주식
-
보너스
$7,273
$67,733
$158,007
면접 후기
후기 3개
난이도
3.3
/ 5
소요 기간
14-28주
경험
긍정 0%
보통 67%
부정 33%
면접 과정
1
Application Review
2
Recruiter Screen
3
Online Assessment
4
Phone Interview
5
Technical Interview
6
Offer
자주 나오는 질문
Coding/Algorithm
Technical Knowledge
Behavioral/STAR
Java/Programming Fundamentals
최근 소식
Jordan Davis tickets on sale in Raleigh at Red Hat Amphitheater - TicketNews
TicketNews
News
·
1w ago
Accelerating space power through software-defined mission dominance - Breaking Defense
Breaking Defense
News
·
1w ago
Red Hat’s OpenClaw maintainer just made enterprise Claw deployments a lot safer - TechCrunch
TechCrunch
News
·
1w ago
Red Hat's Stratis Storage 3.9 Released With Online Encryption/Decryption/Reencryption - Phoronix
Phoronix
News
·
1w ago