
Principal Software Engineer (SRO)
About the role
We are on the lookout for a skilled Principal software engineer(Lead Role) with a strong background in DevOps and platform engineering to join our Application Observability team. This team plays a critical role in managing stateful services within the Service Reliability and Observability (SRO) department.
The SRO department provides innovative observability solutions and standardised methods to enhance the efficiency and reliability of IT systems, simplifying tasks for both infrastructure and software engineers.
The Application Observability (AppO) team focuses on driving observability forward by utilizing tools such as Open Telemetry, Elastic Stack, Prometheus and Grafana.
What you will be doing?
As part of the Application Observability (AppO) team, your responsibilities will include:
- Defining and refining monitoring and alerting rules, both for the team and organisation wide
- Work together with other teams (Platform and Observability Backend) to enhance performance and fulfil user stories
- Leading projects such as Grafana’s migration from on-premises data centers to AWS by planning, defining requirements, supervising and implementing
- Improving the deployment of services using Git workflows and ArgoCD
- Proposing and validating performance and user experience improvements for AppO services
- Addressing issues, implementing preventive measures and managing postmortems and related improvement tasks
- Analysing performance, identifying anomalies and defining, documenting and implementing corrective measures
Ensuring compliance with the SLA - Additionally, you will participate in the on-call rotation for team services, which requires the ability to resolve issues (using runbooks) knowledge on skill like (Elasticsearch, Thanos Kafka, Open Telemetry, Grafana and Docker)
Three KEY domain exposure:
- DevOps
- Platform Engineering
- Application Observability
- Technology->DevOps->Site Reliability Engineering (SRE)
- Good knowledge on software configuration management systems
- Strong business acumen, strategy and cross-industry thought leadership
- Awareness of latest technologies and Industry trends
- Logical thinking and problem-solving skills along with an ability to collaborate
- Two or three industry domain knowledge
- Understanding of the financial processes for various types of projects and the various pricing models available
- Client Interfacing skills
- Knowledge of SDLC and agile methodologies
- Project and Team management
Education: Master Of Engineering,Master Of Technology,MCA,Bachelor Of Engineering,Bachelor Of Technology,BCA
Preferred skills: Technology->DevOps->Site Reliability Engineering(SRE)
Required skills
DevOps
Platform Engineering
Observability
OpenTelemetry
Prometheus
Grafana
Elastic Stack
System Reliability
About Infosys
BANGALORE
Headquarters