热门公司

Red Hat
Red Hat

Provides open source software products to enterprises and is a subsidiary of IBM.

Senior Site Reliability Engineer

职能DevOps
级别资深
地点Pune, India
方式现场办公
类型全职
发布2个月前
立即申请

福利待遇

医疗保险

401k

股权

Remote Work

弹性工作

育儿假

Learning Budget

必备技能

Python

Go

Ansible

Kubernetes

Linux

About the Job :

The Red Hat IT Automation & Intelligence Evolution (AIE) team is seeking a senior site reliability engineer to drive our strategic shift from traditional operations to intelligent automation and AIOps. In this pivotal role, you will serve as a technical lead for reliability and a strategic consultant to the wider organization.

You will design and implement self-service platforms, drive AI-driven operational workflows, and spearhead our alert-noise reduction campaigns. You will act as a technical leader, mentoring junior engineers and partnering with internal teams to identify high-ROI automation opportunities. Your goal is not just to resolve issues but to permanently remove the hidden tax of toil and interruptions through engineering and AI adoption.

What wil you do?

Reliability Engineering & Standards:

  • Define and Enforce SLOs: Lead the definition of Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for critical services, managing Error Budgets to balance feature velocity with system stability.

  • AIOps Implementation: Drive the adoption of AIOps solutions (including Anomaly Detection and Predictive Alerting) to reduce incident volume and improve Mean Time to Resolution (MTTR).

  • Resilience Engineering: Design and lead Chaos Engineering experiments (e.g., fault injection) to validate system recovery and uncover weaknesses before they impact production.

Automation & Efficiency:

  • Eliminate Toil: Identify manual, repetitive work patterns and engineer complex automation solutions to eliminate them, aiming to boost overall team capacity.

  • Intelligent Workflows: Move beyond basic scripting to build intelligent agents and workflows using tools like the Model Context Protocol (MCP) and LLM integrations to automate decision-making processes.

  • Infrastructure as Code: Maintain and evolve the Infrastructure as Code ecosystem, ensuring robust configuration management and version control standards are applied across the environment.

Enablement & Leadership:

  • Internal Consulting: Act as a subject matter expert, engaging with other engineering teams to scope their automation needs and help them build/manage their own workflows.

  • Incident Command: Lead high-severity incident response efforts, serving as the Incident Commander when necessary.

  • Root Cause Analysis: Facilitate blameless post-mortems, focusing on systemic root causes (Graph Algorithm Design, Blast Radius Analysis) rather than human error to prevent recurrence.

  • Mentorship: Mentor junior SREs, conducting code reviews and guiding them through complex troubleshooting and systems engineering principles.

What will you bring?

Technical Competency:

  • Programming: Proficiency in Python or Go, with experience in building modular, scalable software.

  • Automation: Proficiency with Ansible for configuration management, orchestration, and automation workflows.

  • Observability Stack: Expert-level knowledge of monitoring ecosystems, specifically the TIGK Stack (Telegraf, InfluxDB, Grafana, Kapacitor) and Prometheus.

  • Cloud & Containerization: Deep understanding of Linux environments, Kubernetes/Open Shift, and public cloud infrastructure (AWS/Azure/GCP).

SRE Methodology:

  • Demonstrated experience designing and implementing SLIs, SLOs, and Error Budgets.

  • Proven track record of Toil Reduction strategies and implementation.

  • Experience with Incident Management lifecycles (escalation policies, paging, and post-mortems).

Soft Skills:

  • Growth Mindset: Open-minded approach to problem-solving and a demonstrated willingness to learn and adopt new technologies.

  • Strategic Thinking: Ability to translate business goals into technical roadmaps.

  • Communication: Strong ability to explain complex reliability concepts to non-SRE teams and leadership.

The following are considered as a plus:

  • Automation Platforms: Experience with Ansible Automation Platform (AAP) or similar configuration management tools for enterprise-scale environments.

  • AI/LLM Integration: Experience with Model Context Protocol (MCP), Claude Plugin development, or integrating LLMs into operational workflows.

  • Data Science for Ops: Experience with regression data or algorithms for predictive alerting.

  • Security: Experience with hardening systems (Bastion Hosts) and managing security policies within automation workflows.

About Red Hat

Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40 countries, our associates work flexibly across work environments, from in-office, to office-flex, to fully remote, depending on the requirements of their role. Red Hatters are encouraged to bring their best ideas, no matter their title or tenure. We're a leader in open source because of our open and inclusive environment. We hire creative, passionate people ready to contribute their ideas, help solve complex problems, and make an impact.

Inclusion at Red Hat
Red Hat’s culture is built on the open source principles of transparency, collaboration, and inclusion, where the best ideas can come from anywhere and anyone. When this is realized, it empowers people from different backgrounds, perspectives, and experiences to come together to share ideas, challenge the status quo, and drive innovation. Our aspiration is that everyone experiences this culture with equal opportunity and access, and that all voices are not only heard but also celebrated. We hope you will join our celebration, and we welcome and encourage applicants from all the beautiful dimensions that compose our global village.

Equal Opportunity Policy (EEO)
Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.

Red Hat does not seek or accept unsolicited resumes or CVs from recruitment agencies. We are not responsible for, and will not pay, any fees, commissions, or any other payment related to unsolicited resumes or CVs except as required in a written contract between Red Hat and the recruitment agency or party requesting payment of a fee.

Red Hat supports individuals with disabilities and provides reasonable accommodations to job applicants. If you need assistance completing our online job application, email application-assistanceredhat.com. General inquiries, such as those regarding the status of a job application, will not receive a reply.

浏览量

0

申请点击

0

Mock Apply

0

收藏

0

关于Red Hat

Red Hat

Red Hat

Acquired

Red Hat, Inc. is an American software company that provides open source software products to enterprises and is a subsidiary of IBM. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina, with other offices worldwide.

10,001+

员工数

Raleigh

总部位置

$34B

企业估值

评价

10条评价

3.9

10条评价

工作生活平衡

3.2

薪酬

4.0

企业文化

4.3

职业发展

3.5

管理层

3.4

75%

推荐率

优点

Great team culture and collaboration

Flexible work arrangements

Excellent benefits and compensation

缺点

Heavy workload and stress during peak times

Work-life balance challenges

Fast-paced and overwhelming environment

薪资范围

993个数据点

Junior/L3

Mid/L4

Senior/L5

Junior/L3 · Associate Consultant

70份报告

$103,140

年薪总额

基本工资

$95,867

股票

-

奖金

$7,273

$67,733

$158,007

面试评价

3条评价

难度

3.3

/ 5

时长

14-28周

体验

正面 0%

中性 67%

负面 33%

面试流程

1

Application Review

2

Recruiter Screen

3

Online Assessment

4

Phone Interview

5

Technical Interview

6

Offer

常见问题

Coding/Algorithm

Technical Knowledge

Behavioral/STAR

Java/Programming Fundamentals