refresh

Trending companies

Trending companies

HCL Technologies
HCL Technologies

Track Lead - Azure DevOps, Terraform

RoleDevops
LevelLead
LocationCalgary, Canada
WorkOn-site
TypeFull-time
Posted1 week ago
Apply now

About the role

Job Summary

The SRE will be responsible for the reliability, availability, and performance of Azure/AWS PaaS and IaaS workloads. They bridge the gap between development and operations, focusing on building automated systems that prevent failures, managing incident responses, and optimizing cloud costs.

Key Responsibilities

  • System Reliability & Monitoring: Design, implement, and maintain comprehensive monitoring and alerting systems such as Azure Monitor, AWS CloudWatch, Application Insights, and Log Analytics.
  • Automation & Toil Reduction: Automate repetitive manual operations (toil) such as environment provisioning, system patching, and scaling. Use IaC tools like Terraform and Ansible to manage infrastructure.
  • Incident Response & Management: Actively manage incident responses, root cause analysis (RCA), and post-mortem investigations to improve system reliability and minimize mean time to resolution (MTTR).
  • Cloud SRE Agent Integration: Deploy and configure Cloud SRE Agent to automate incident investigation, execute remediation steps (restart, scale, rollback), and manage routine tasks.
  • Capacity Planning & Scalability: Analyze usage patterns to optimize cloud resources, ensuring high availability and performance while managing costs via Azure Cost Management.
  • CI/CD & DevOps Collaboration: Integrate automation workflows into CI/CD pipelines (e.g., GitHub Actions or Azure Pipelines) to ensure reliable deployments.

Skill Requirements

  • Cloud Platforms: Expert knowledge of Microsoft Azure infrastructure services (Compute, Storage, Networking, AKS).
  • Scripting & Programming: Proficiency in Python, Bash, or PowerShell for building automation tools.
  • Infrastructure as Code (IaC): Extensive experience with Terraform and ARM templates/Bicep.
  • Observability Tools: Experience with Azure Monitor, Grafana, Prometheus, or Datadog.
  • Containers & Orchestration: Solid understanding of Kubernetes/AKS (Azure Kubernetes Service).
  • Operating Systems: Proficient in Windows/Linux environments.
  • Azure Certification is a + • Exposure to multi Cloud environment is must.

Other Requirements

Reviewing Service Level Objectives (SLOs) and error budgets. 2. Refining auto-scaling rules for Kubernetes clusters based on traffic trends. 3. Working with developers to review service architecture and ensure fault tolerance. 4. Configuring AI-driven alert suppression to reduce alert fatigue. 5. Creating Azure Dashboards to visualize key performance indicators (KPIs).

Required skills

Azure DevOps

Terraform

CI/CD

Automation

About HCL Technologies

Calgary

Headquarters