採用
Section 2: Position Overview
Provide a concise summary of the position, its purpose, and its importance within the company.
Section 1: Job Title
Clearly state the position’s title to convey the role and level of responsibility.
Infrastructure Engineering Associate Advisor Position Overview
The Pharmacy Benefit Services+ Technology organization is seeking a Site Reliability Engineer (SRE) – Automation, Self‑Healing & AI/AIOps to join our team. This Band 4 Contributor role is a senior, hands‑on position responsible for driving enterprise reliability outcomes, reducing operational toil, and enabling scalable SRE adoption across both legacy platforms and modern cloud‑native systems.
In this role, you will lead the design and implementation of intelligent, automated, and AI‑assisted reliability solutions that ensure systems are resilient, observable, self‑healing, and continuously improving. You will operate at the intersection of software engineering, operations, automation, and AI, influencing how teams design, deploy, and operate production systems.
A core focus of this role is building automation‑first and agentic SRE capabilities, including:
- Self‑healing workflows that automatically detect, diagnose, and remediate failures
- AI‑driven operational intelligence (AIOps) for anomaly detection, alert correlation, incident triage, and guided remediation
- Standardized SRE enablement platforms (SLO automation, reliability scorecards, FMEA workflows) that can be adopted at scale with minimal friction
You will collaborate closely with application teams, platform engineering, DevOps, infrastructure, QE, and IT leadership to embed reliability into the SDLC and runtime operations.
Your contributions will directly support:
- Improved system availability and resilience through proactive reliability engineering and automation
- Reduced incidents and faster MTTR via self‑healing and AI‑assisted operations
- Higher developer productivity by eliminating manual operational toil
- Faster, safer releases by integrating SRE controls into CI/CD pipelines
- Measurable reliability improvements, including reductions in MTTD/MTTR, decreased incident frequency, improved SLO compliance, and healthier error‑budget consumption through automation and AI‑enabled operations
Section 3: Responsibilities
Clearly outline the primary duties and tasks associated with the role. Use action verbs (i.e., lead, drive, analyze, assess, research, etc.) to convey expectations.
Responsibilities Core SRE & Reliability Engineering
- Define, implement, and operationalize SRE best practices including SLIs, SLOs, error budgets, reliability reviews, and operational readiness standards across multiple teams and platforms.
- Act as a senior reliability engineer for mission‑critical systems, influencing architectural decisions to improve availability, scalability, and fault tolerance.
- Lead blameless incident response, root‑cause analysis, and post‑incident reviews, ensuring systemic fixes and automation are prioritized.
Self‑Healing Automation
- Design and implement self‑healing systems that:
- Automatically detect failures using telemetry and signals
- Diagnose probable root causes using rules and AI/ML models
- Execute automated remediation actions (restart, scale, reroute, rollback, configuration correction)
- Build event‑driven automation workflows integrated with monitoring, CI/CD, and infrastructure platforms to reduce human intervention.
- Develop and maintain automated runbooks and remediation pipelines that evolve based on historical incidents and outcomes.
- Extend self‑healing automation to change and release workflows, including automated rollback, safeguarded change execution, and AI‑assisted change‑risk evaluation to reduce change‑related incidents.
- Continuously identify and eliminate operational toil by replacing repetitive manual work with automation, self‑service tooling, and intelligent remediation.
AI / Agentic SRE (AIOps)
- Apply AI and ML techniques to improve operational intelligence, including:
- Anomaly detection across metrics, logs, and traces
- Alert deduplication, correlation, and noise reduction
- Intelligent incident summarization and impact analysis
- Predictive failure detection and capacity‑risk forecasting
- Implement agentic AI patterns where autonomous or semi‑autonomous agents:
- Continuously monitor system health
- Propose or execute remediation actions
- Learn from past incidents and operator feedback
- Establish continuous learning feedback loops to tune AI models and agent behavior based on false positives, incident outcomes, and operator review.
- Partner with platform and security teams to ensure responsible, secure, and compliant use of AI in production operations.
Observability & Telemetry
- Establish observability‑by‑design standards for applications and platforms (metrics, logs, traces, events).
- Improve signal quality and alerting strategies to focus on user and business impact, not infrastructure noise.
- Build and maintain reliability dashboards and scorecards that provide real‑time and historical insights into service health.
Resilience, Performance & Validation
- Drive resilience and fault‑tolerance validation using chaos engineering and controlled failure injection.
- Partner with performance and platform teams to ensure systems meet performance, scalability, and recovery objectives.
- Promote safe testing practices for legacy‑integrated systems (e.g., service virtualization where direct backend calls pose risk).
CI/CD & Platform Enablement
- Embed SRE controls into CI/CD pipelines, including:
- SLO validation gates
- Automated canary analysis
- Release health checks and rollback triggers
- Build reusable SRE platforms, templates, and onboarding kits that enable teams to adopt reliability practices with minimal manual effort.
- Mentor engineers and act as a technical leader for SRE adoption across the organization.
Section 4: Qualifications
Specify the skills, experience, and education required for the role. Differentiate between the “must-haves” and “nice-to-haves”.
Required skills: List the specific skills required for the job, including technical, leadership skills, and any industry-specific skills.
Required Experience: Clearly state any mandatory requirements, such as formal education, certifications, licenses, or specific years of experience.
Desired Experience: List any “nice-to-have” experience, including industry experience, exposure to specific technologies, certifications, etc.
Qualifications Required Skills:
- Site Reliability Engineering: Deep hands‑on experience with SLOs, error budgets, incident management, and production operations.
- Automation & Software Engineering: Strong development skills in Python, Go, Java, or similar, with the ability to build production‑grade automation and services.
- Self‑Healing Systems: Proven experience designing and implementing automated remediation and closed‑loop recovery workflows.
- AI / AIOps: Experience applying AI/ML to operations, such as anomaly detection, alert correlation, predictive analysis, or intelligent remediation.
- Observability: Expertise with platforms such as Dynatrace, Prometheus, Grafana, Splunk, App Dynamics, or equivalent.
- Cloud & Distributed Systems: Strong understanding of AWS / Azure / GCP, microservices, and Kubernetes / Open Shift.
- CI/CD & DevOps: Experience integrating reliability checks and automation into delivery pipelines.
- Infrastructure as Code: Terraform, CloudFormation, or similar.
- Legacy + Modern Engineering: Ability to support and modernize reliability practices across monoliths, batch jobs, messaging, and mainframe‑integrated systems.
- Leadership & Influence: Ability to lead through influence, mentor others, and drive adoption across multiple teams.
Required Experience & Education:
- Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent experience).
- 7+ years of experience in SRE, DevOps, platform engineering, or production software engineering roles.
- Demonstrated success delivering enterprise‑scale automation, self‑healing, and reliability improvements.
Desired Experience:
- Experience building or contributing to enterprise SRE enablement platforms (SLO automation, reliability scorecards, FMEA workflows).
- Hands‑on experience with chaos engineering and resilience testing in production‑like environments.
- Familiarity with Service Now / CMDB / service modeling to support operational readiness and dependency visibility.
- Experience applying Generative AI for operational use cases such as runbook generation, incident summarization, and knowledge retrieval.
- Demonstrated delivery of quantifiable reliability improvements (e.g., MTTR reduction, incident volume reduction, improved SLO adherence).
- Experience mentoring engineers and shaping an automation‑first, reliability‑driven culture.
These two sections will be “standardized” in the JD template and made not editable.
Location & Hours of Work
- Full-time position, working 40 hours per week. Expected overlap with US hours as appropriate
- Primarily based in the Innovation Hub in Hyderabad, India in a hybrid working model (3 days WFO and 2 days WAH)
Equal Opportunity Statement
Evernorth is an Equal Opportunity Employer actively encouraging and supporting organization-wide involvement of staff in diversity, equity, and inclusion efforts to educate, inform and advance both internal practices and external work with diverse client populations.
About Evernorth Health Services
Evernorth Health Services, a division of The Cigna Group, creates pharmacy, care and benefit solutions to improve health and increase vitality. We relentlessly innovate to make the prediction, prevention and treatment of illness and disease more accessible to millions of people. Join us in driving growth and improving lives.
総閲覧数
0
応募クリック数
0
模擬応募者数
0
スクラップ
0
類似の求人

Site Reliability Engineer, Associate
BlackRock · Gurgaon, India

Associate DevOps Engineer
DHL · Chennai, Tamil Nādu, India

Site Reliability Engineer - Associate - Reliability & Production Engineering
Morgan Stanley · Bengaluru, Karnataka, India

Client Experience - Associate
JPMorgan Chase · Bengaluru, Karnataka, India, IN

Associate Command Center Engineer, CloudOps
Stryker · Bengaluru, India
Cignaについて

Cigna
PublicThe Cigna Group is an American multinational for-profit managed healthcare and insurance company based in Bloomfield, Connecticut.
10,001+
従業員数
Bloomfield
本社所在地
$54B
企業価値
レビュー
3.7
10件のレビュー
ワークライフバランス
4.2
報酬
2.8
企業文化
4.1
キャリア
3.5
経営陣
3.2
65%
友人に勧める
良い点
Supportive and encouraging management
Excellent work-life balance and flexible hours
Great health benefits and vacation time
改善点
Below market compensation and low pay
Poor management and lack of transparency
High workload and stress levels
給与レンジ
38件のデータ
L2
L3
L4
L5
L6
L2 · Cybersecurity Analyst L2
0件のレポート
$66,170
年収総額
基本給
$26,468
ストック
$33,085
ボーナス
$6,617
$46,319
$86,021
面接体験
4件の面接
難易度
2.8
/ 5
期間
14-28週間
内定率
50%
体験
ポジティブ 50%
普通 0%
ネガティブ 50%
面接プロセス
1
Application Review
2
Recruiter Screen
3
Technical Phone Screen
4
Team Member Interviews
5
Panel/Multiple Interviews
6
Offer
よくある質問
Coding/Algorithm
Technical Knowledge
Behavioral/STAR
Past Experience
Culture Fit
ニュース&話題
Moran Wealth Management LLC Grows Holdings in Cigna Group $CI - MarketBeat
MarketBeat
News
·
5d ago
Mixed-use plans move forward for old Cigna building in Hooksett - UnionLeader.com
UnionLeader.com
News
·
6d ago
The Cigna Group Foundation Invites Memphis-Area Nonprofits to Apply for $250,000 Grants to Improve Health Care Access - PR Newswire
PR Newswire
News
·
6d ago
KBC Group NV Sells 38,820 Shares of Cigna Group $CI - MarketBeat
MarketBeat
News
·
1w ago