
SME - RedHat Cluster, Ansible, Kubernetes, Microsoft Azure
About the role
Job Summary
As a Subject Matter Expert in Support & Operations, you will play a pivotal role in ensuring the timely resolution of escalated incidents while adhering to quality norms and service level agreements (SLAs). Your expertise in Kubernetes, Ansible, and cloud technologies will be essential in driving customer satisfaction and operational excellence.
Key Responsibilities
==========================================================================================================================
Job Title: Platform Site Reliability Engineer (SRE) – Open Shift (8+ Years)Location
Bangalore / Kolkata / Pune
Band
L2
Role Summary
We are looking for a Platform SRE (6+ years)to engineer, run, and continuously improve an Open Shift-heavy container platform. This role combines** Day‑1 responsibilities**(platform setup, standardization, onboarding enablement) with** Day‑2 operations** (stability, upgrades, performance, incident management, and automation).
Key Responsibilities Day‑1 (Build / Enablement)
-
Support Open Shift platform onboarding: cluster setup assistance, baseline configurations, and environment readiness.
-
Implement platform standards: namespaces/projects, RBAC/SCC, resource quotas/limits, routes/ingress patterns, and operator enablement.
-
Create reusable deployment patterns using Helm (standard charts/templates, values structure, versioning).
-
Build and standardize GitLab CI templates/pipelines for build-test-deploy and environment promotion.
-
Develop automation using Ansible to enable repeatable provisioning/configuration workflows.
Day‑2 (Run / Operate / Optimize)
-
Own cluster health and reliability: monitoring, capacity planning, scaling, patching and upgrades, and performance troubleshooting.
-
Troubleshoot issues across Open Shift components, nodes, networking/storage basics, and workload behaviour.
-
Participate in incident response: triage, mitigation, RCA, post-incident actions, and runbook/SOP improvements.
-
Reduce operational toil through automation, improved alerts, and self-service enablement for application teams.
-
Collaborate with stakeholders to improve security posture and operational governance (access controls, platform hygiene).
Mandatory Skills
-
8+ years’ experience in SRE / DevOps / Platform / Infrastructure Engineering
-
Strong hands-on Open Shift Administration
-
Helm (deployments + chart maintenance; chart authoring preferred)
-
Ansible (playbooks/roles; automation mindset)
-
Linux fundamentals (logs, processes, system services, basic networking)
-
CI/CD with GitLab CI (pipelines, runners, templates, variables/secrets)
Good-to-Have Skills
-
ArgoCD (Git Ops)
-
v Sphere,NSX,VMware Cloud Foundation (VCF)
-
Exposure to observability stacks (Prometheus/Grafana, ELK/EFK, Splunk, Datadog, etc.)
Traits We Value
-
Strong troubleshooting, ownership, and production support mindset
-
Comfortable operating in structured on-call rotations and handling high-severity incidents
-
Good documentation habits (runbooks, SOPs, RCA notes)
=================================
Skill Requirements
-
Proficient In Kubernetes And Containers Management.
-
Strong Understanding Of Ansible For Automation And Configuration Management.
-
Familiarity With Redhat Linux And Redhat Cluster Technologies.
-
Knowledge Of Azure Cloud Services And Their Integration With Kubernetes.
-
Excellent Analytical And Problem-Solving Skills, With A Focus On Customer Satisfaction And Operational Efficiency.
Other Requirements
- Relevant Certifications Such As Certified Kubernetes Administrator (Cka) Or Red Hat Certified Engineer (Rhce) Are Optional But Valuable
Required skills
OpenShift
Kubernetes
Helm
GitLab CI
Ansible
Monitoring
Capacity planning
Incident management
About HCL Technologies
Bengaluru
Headquarters