
Administrator - Azure DevOps, Terraform
About the role
Job Summary
SRE/DevOps Engineer responsible for ensuring system reliability, scalability, and performance by combining software engineering with operations, automation, and continuous delivery practices. Key Responsibilities Design and manage highly available, scalable, and reliable systems Implement and maintain CI/CD pipelines for faster and stable releases Monitor system health using observability tools (metrics, logs, traces) Define and manage SLIs, SLOs, and SLAs Automate infrastructure using Infrastructure as Code (IaC) Perform incident management, root cause analysis (RCA), and problem resolution Optimize system performance, cost, and capacity planning Ensure system security, compliance, and resilience Collaborate with development teams to improve system design and reliability Drive automation to reduce manual intervention and improve efficiency Required Skills Strong experience with DevOps tools (Azure DevOps, Jenkins, GitLab CI/CD) Expertise in cloud platforms (Azure/AWS/GCP) Knowledge of containerization (Docker) and orchestration (Kubernetes) Experience in monitoring & logging tools (Prometheus, Grafana, ELK, Azure Monitor, Splunk) Proficiency in scripting (Python, Bash, PowerShell) Hands-on with Terraform, Ansible, or ARM templates Understanding of networking, OS (Linux), and distributed systems Experience in incident response and production support
Key Responsibilities
SRE/DevOps Engineer responsible for ensuring system reliability, scalability, and performance by combining software engineering with operations, automation, and continuous delivery practices. Key Responsibilities Design and manage highly available, scalable, and reliable systems Implement and maintain CI/CD pipelines for faster and stable releases Monitor system health using observability tools (metrics, logs, traces) Define and manage SLIs, SLOs, and SLAs Automate infrastructure using Infrastructure as Code (IaC) Perform incident management, root cause analysis (RCA), and problem resolution Optimize system performance, cost, and capacity planning Ensure system security, compliance, and resilience Collaborate with development teams to improve system design and reliability Drive automation to reduce manual intervention and improve efficiency Required Skills Strong experience with DevOps tools (Azure DevOps, Jenkins, GitLab CI/CD) Expertise in cloud platforms (Azure/AWS/GCP) Knowledge of containerization (Docker) and orchestration (Kubernetes) Experience in monitoring & logging tools (Prometheus, Grafana, ELK, Azure Monitor, Splunk) Proficiency in scripting (Python, Bash, PowerShell) Hands-on with Terraform, Ansible, or ARM templates Understanding of networking, OS (Linux), and distributed systems Experience in incident response and production support
Skill Requirements
SRE/DevOps Engineer responsible for ensuring system reliability, scalability, and performance by combining software engineering with operations, automation, and continuous delivery practices. Key Responsibilities Design and manage highly available, scalable, and reliable systems Implement and maintain CI/CD pipelines for faster and stable releases Monitor system health using observability tools (metrics, logs, traces) Define and manage SLIs, SLOs, and SLAs Automate infrastructure using Infrastructure as Code (IaC) Perform incident management, root cause analysis (RCA), and problem resolution Optimize system performance, cost, and capacity planning Ensure system security, compliance, and resilience Collaborate with development teams to improve system design and reliability Drive automation to reduce manual intervention and improve efficiency Required Skills Strong experience with DevOps tools (Azure DevOps, Jenkins, GitLab CI/CD) Expertise in cloud platforms (Azure/AWS/GCP) Knowledge of containerization (Docker) and orchestration (Kubernetes) Experience in monitoring & logging tools (Prometheus, Grafana, ELK, Azure Monitor, Splunk) Proficiency in scripting (Python, Bash, PowerShell) Hands-on with Terraform, Ansible, or ARM templates Understanding of networking, OS (Linux), and distributed systems Experience in incident response and production support
Other Requirements
SRE/DevOps Engineer responsible for ensuring system reliability, scalability, and performance by combining software engineering with operations, automation, and continuous delivery practices. Key Responsibilities Design and manage highly available, scalable, and reliable systems Implement and maintain CI/CD pipelines for faster and stable releases Monitor system health using observability tools (metrics, logs, traces) Define and manage SLIs, SLOs, and SLAs Automate infrastructure using Infrastructure as Code (IaC) Perform incident management, root cause analysis (RCA), and problem resolution Optimize system performance, cost, and capacity planning Ensure system security, compliance, and resilience Collaborate with development teams to improve system design and reliability Drive automation to reduce manual intervention and improve efficiency Required Skills Strong experience with DevOps tools (Azure DevOps, Jenkins, GitLab CI/CD) Expertise in cloud platforms (Azure/AWS/GCP) Knowledge of containerization (Docker) and orchestration (Kubernetes) Experience in monitoring & logging tools (Prometheus, Grafana, ELK, Azure Monitor, Splunk) Proficiency in scripting (Python, Bash, PowerShell) Hands-on with Terraform, Ansible, or ARM templates Understanding of networking, OS (Linux), and distributed systems Experience in incident response and production support
Benefits and perks
•Learning Budget
Required skills
Azure DevOps
Jenkins
Terraform
Docker
Kubernetes
Monitoring
Scripting
About HCL Technologies
Mississauga
Headquarters