
Sr Administrator (Support & Operations)
About the role
Job Summary
As a Platform Engineer, you are responsible for ensuring the stability, performance, and automation of the cloud platform’s core services, including API automation layers, observability components, CI/CD workflows, IaC toolchains, and QA/Documentation systems.
Key Responsibilities
- Core Responsibilities 1. Platform Operations & Reliability (Run Engineering) • Operate and maintain key platform services such as the Terraform Registry, Tracing infrastructure, SGCP Quality & Observability resources, and documentation & chat support systems.
- Ensure availability, performance, resilience, and secure lifecycle management for all production components.
- Perform patching, upgrades, and vulnerability remediation, aiming for minimal human intervention on production systems.
- Lead incident response, perform deep root cause analysis, and implement long term corrective actions.
- Reduce operational toil through automation, workflow industrialization, and proactive reliability engineering. 2. CI/CD & Delivery Platforming • Operate and evolve the cloud platform’s CI/CD pipelines and reusable workflows used by ~300 developers.
- Manage the lifecycle of base Docker images: security hardening, automated build pipelines, versioning, and distribution.
- Maintain and extend the platform’s IaC toolchain, including Terraform workflows, deployment pipelines, and registry management.
- Continuously improve delivery performance, deployment reliability, and overall developer experience.
- Contribute to the technical roadmap with an engineering driven mindset. 3. Observability Engineering • Maintain and enhance the cloud platform’s observability stack across traces, and dashboards.
- Ensure full visibility into system behaviour, performance drifts, errors, and capacity indicators.
- Build automation for alerting, anomaly detection, and platform health insights, improving signal quality and reducing noise.
- Support SRE practices to strengthen platform reliability through data driven insights. 4. User Support & Platform Adoption • Participate in system demos, validation sessions, and operational readiness reviews.
- Act as a partner for SG Cloud engineering teams in troubleshooting and platform enablement.
Skill Requirements
- Core Responsibilities 1. Platform Operations & Reliability (Run Engineering) • Operate and maintain key platform services such as the Terraform Registry, Tracing infrastructure, SGCP Quality & Observability resources, and documentation & chat support systems.
- Ensure availability, performance, resilience, and secure lifecycle management for all production components.
- Perform patching, upgrades, and vulnerability remediation, aiming for minimal human intervention on production systems.
- Lead incident response, perform deep root cause analysis, and implement long term corrective actions.
- Reduce operational toil through automation, workflow industrialization, and proactive reliability engineering. 2. CI/CD & Delivery Platforming • Operate and evolve the cloud platform’s CI/CD pipelines and reusable workflows used by ~300 developers.
- Manage the lifecycle of base Docker images: security hardening, automated build pipelines, versioning, and distribution.
- Maintain and extend the platform’s IaC toolchain, including Terraform workflows, deployment pipelines, and registry management.
- Continuously improve delivery performance, deployment reliability, and overall developer experience.
- Contribute to the technical roadmap with an engineering driven mindset. 3. Observability Engineering • Maintain and enhance the cloud platform’s observability stack across traces, and dashboards.
- Ensure full visibility into system behaviour, performance drifts, errors, and capacity indicators.
- Build automation for alerting, anomaly detection, and platform health insights, improving signal quality and reducing noise.
- Support SRE practices to strengthen platform reliability through data driven insights. 4. User Support & Platform Adoption • Participate in system demos, validation sessions, and operational readiness reviews.
- Act as a partner for SG Cloud engineering teams in troubleshooting and platform enablement.
Other Requirements
Key Skills & Competencies Technical Skills • Strong experience with CI/CD tooling (Github Action/GitLab CI, Jenkins) • Solid expertise in Infrastructure as Code—Terraform, Ansible preferred • Hands on experience with platform automation, scripting/coding (Python), and workflow orchestration • Proficiency in containerized environments (Docker / Kubernetes, registries, build pipelines) • Understanding of monitoring and observability at scale (metrics, logs, traces) Engineering Mindset • Reliability first mindset with strong operational discipline • Ability to automate, industrialize, and eliminate manual processes • Strong troubleshooting capabilities across distributed systems • Clear communication and collaborative problem solving across global teams
Benefits and perks
•Learning Budget
Required skills
Platform engineering
CI/CD
IaC
Observability
Terraform
Docker
Incident response
RCA
About HCL Technologies
Bengaluru
Headquarters