
Senior Technical Lead
About the role
Job Summary
To be responsible as a SRE Engineer willing to work in 24 X 7 shifts at client location
Key Responsibilities
In this role you will be responsible for Operating and Managing production cloud platforms, responsible for Ops (executing runbook/SOP/ Maintain up-time/SLA) as well as Site Reliability engineering.
Resolve alerts as per runbook and/or drive the resolution of the alert by collaborating with different teams and stakeholders across Verizon.
Update relevant runbooks post resolution with relevant future resolution steps.
Aim to automate runbooks as much as possible.
You will work towards reducing number of alert escalation to next level team – dev/devops
Provide clear details of Customer facing outage to field/service delivery teams to communicate to Customers.
Lead post mortem meeting so that improvements are built into product to avoid future outage Strong analytical skills to understand production system metrics, drive change, optimize system utilization and drive cost efficiency Production roll out of new releases Constantly improve Monitoring/Alerting posture so as to proactively detect issues before customers report it.
Build reports/dashboard/KPI and lead monthly operations review. Some examples include, but are not limited to – Platform/Application/Infrastructure KPIs, security reports, audit reports.
Support Application, OS and database patches/updates Execute security tools, analyze vulnerabilities/findings and work towards remediating it working with Dev/DevOps team Key stakeholder to participate incase of IR (Incident Response).
Skill Requirements
-
Experience working in Kubernetes, SRE monitoring tools, DevOps tools, Cloud based production environment
-
Experience working with Tech stack similar to VZ with focus towards up-time, SLA , Security and Site Reliability Engineering
-
Advance AWS Certification
-
Good understanding of large-scale distributed systems in practice, microservice(s), serverless and ability to debug/root cause complex distributed system in production; build/automate Runbooks
Other Requirements
null
Required skills
SRE
Cloud operations
Runbooks
Incident response
Monitoring
Automation
Security patching
About HCL Technologies
Bangalore
Headquarters