HCL Technologies

SME - RedHat Cluster, Ansible, Kubernetes, Microsoft Azure

RoleInfrastructure

LevelSenior

LocationBengaluru, India

WorkOn-site

TypeFull-time

Posted2 days ago

Apply now

About the role

Job Summary

As a Subject Matter Expert in Support & Operations, you will play a pivotal role in ensuring the timely resolution of escalated incidents while adhering to quality norms and service level agreements (SLAs). Your expertise in Kubernetes, Ansible, and cloud technologies will be essential in driving customer satisfaction and operational excellence.

Key Responsibilities

==========================================================================================================================

Job Title: Platform Site Reliability Engineer (SRE) – Open Shift (8+ Years)Location

Bangalore / Kolkata / Pune

Band

Role Summary

We are looking for a Platform SRE (6+ years)to engineer, run, and continuously improve an Open Shift-heavy container platform. This role combines** Day‑1 responsibilities**(platform setup, standardization, onboarding enablement) with** Day‑2 operations** (stability, upgrades, performance, incident management, and automation).

Key Responsibilities Day‑1 (Build / Enablement)

Support Open Shift platform onboarding: cluster setup assistance, baseline configurations, and environment readiness.
Implement platform standards: namespaces/projects, RBAC/SCC, resource quotas/limits, routes/ingress patterns, and operator enablement.
Create reusable deployment patterns using Helm (standard charts/templates, values structure, versioning).
Build and standardize GitLab CI templates/pipelines for build-test-deploy and environment promotion.
Develop automation using Ansible to enable repeatable provisioning/configuration workflows.

Day‑2 (Run / Operate / Optimize)

Own cluster health and reliability: monitoring, capacity planning, scaling, patching and upgrades, and performance troubleshooting.
Troubleshoot issues across Open Shift components, nodes, networking/storage basics, and workload behaviour.
Participate in incident response: triage, mitigation, RCA, post-incident actions, and runbook/SOP improvements.
Reduce operational toil through automation, improved alerts, and self-service enablement for application teams.
Collaborate with stakeholders to improve security posture and operational governance (access controls, platform hygiene).

Mandatory Skills

8+ years’ experience in SRE / DevOps / Platform / Infrastructure Engineering
Strong hands-on Open Shift Administration
Helm (deployments + chart maintenance; chart authoring preferred)
Ansible (playbooks/roles; automation mindset)
Linux fundamentals (logs, processes, system services, basic networking)
CI/CD with GitLab CI (pipelines, runners, templates, variables/secrets)

Good-to-Have Skills

ArgoCD (Git Ops)
v Sphere,NSX,VMware Cloud Foundation (VCF)
Exposure to observability stacks (Prometheus/Grafana, ELK/EFK, Splunk, Datadog, etc.)

Traits We Value

Strong troubleshooting, ownership, and production support mindset
Comfortable operating in structured on-call rotations and handling high-severity incidents
Good documentation habits (runbooks, SOPs, RCA notes)

=================================

Skill Requirements

Proficient In Kubernetes And Containers Management.
Strong Understanding Of Ansible For Automation And Configuration Management.
Familiarity With Redhat Linux And Redhat Cluster Technologies.
Knowledge Of Azure Cloud Services And Their Integration With Kubernetes.
Excellent Analytical And Problem-Solving Skills, With A Focus On Customer Satisfaction And Operational Efficiency.

Other Requirements

Relevant Certifications Such As Certified Kubernetes Administrator (Cka) Or Red Hat Certified Engineer (Rhce) Are Optional But Valuable

Required skills

OpenShift

Kubernetes

Helm

GitLab CI

Ansible

Monitoring

Capacity planning

Incident management

About HCL Technologies

HCL Technologies

Bengaluru

Headquarters