Infosys
Infosys

Site Reliability Engineer

RoleDevops
LevelMid Level
LocationMangalore, India
WorkOn-site
TypeFull-time
Posted1 month ago
Apply now

About the role

  • Manage capacity and performance to help scale the infrastructure both on public and private clouds around the world
  • Define and implement standards and best practices related to: System Architecture, Deployment, metrics, operational tasks
  • Support services through activities such as monitoring availability, system health, and incident response
  • Improve system performance, application delivery and efficiency through automation, process refinement, postmortem reviews, and in-depth configuration analysis
  • Engage in Communications across all areas of the organization
  • Troubleshooting and monitoring production systems to ensure the highest uptimes are maintained
  • Support and improve upon existing high-availability architecture solutions as well as manage the operational activity.
  • Integrate Generative AI (GenAI) and AIOps tools to automate incident detection, root cause analysis, and resolution workflows (e.g., self-healing scripts, intelligent runbooks), reducing manual toil and accelerating response times.
  • Apply Prompt Engineering techniques to enhance interactions with AI-based observability and automation platforms improving accuracy and efficiency of AI responses.
  • Leverage platform-specific AI capabilities (e.g., AWS Bedrock, Azure OpenAI, GCP Vertex AI) to architect intelligent SRE solutions tailored to cloud environments.
  • Experience in one or more high level programming languages like Python or Ruby or Go Lang and familiar with Object Oriented Programming.
  • Design and implement the CI/CD/CT pipeline on one or more tool stack, like Jenkins, Bamboo, azure DevOps, and AWS Code pipeline with
  • Proficiency in one or more Infrastructure as code tools (e.g. Terraform, Cloud Formation, Azure ARM etc.)
  • Developing, managing monitoring tools and log analysis tools to manage operations with exposure to tools such as App Dynamics, Data Dog, Splunk, Kibana, Prometheus, Grafana Elasticsearch (and not limited to).
  • Hands-On experience with AIOps Process and platforms (e.g., Dynatrace, Splunk, Service Now) for incident management, observability and noise reduction.
  • Familiarity with Prompt Engineering, GenAI Applications, AI/ML frameworks (e.g., Tensor Flow, Py Torch) or platforms (e.g., OpenAI, Vertex AI).
  • Having awareness of Agentic AI solutions applicable in Operations and Support.

Education: Bachelor of Engineering

Preferred skills: Technology->DevOps->Site Reliability Engineering(SRE)

About Infosys

MANGALORE

Headquarters