Administrator - Azure DevOps, Terraform

RoleDevops

LevelJunior

LocationMississauga, Canada

WorkOn-site

TypeFull-time

Posted1 day ago

Apply now

About the role

Job Summary

SRE/DevOps Engineer responsible for ensuring system reliability, scalability, and performance by combining software engineering with operations, automation, and continuous delivery practices. Key Responsibilities Design and manage highly available, scalable, and reliable systems Implement and maintain CI/CD pipelines for faster and stable releases Monitor system health using observability tools (metrics, logs, traces) Define and manage SLIs, SLOs, and SLAs Automate infrastructure using Infrastructure as Code (IaC) Perform incident management, root cause analysis (RCA), and problem resolution Optimize system performance, cost, and capacity planning Ensure system security, compliance, and resilience Collaborate with development teams to improve system design and reliability Drive automation to reduce manual intervention and improve efficiency Required Skills Strong experience with DevOps tools (Azure DevOps, Jenkins, GitLab CI/CD) Expertise in cloud platforms (Azure/AWS/GCP) Knowledge of containerization (Docker) and orchestration (Kubernetes) Experience in monitoring & logging tools (Prometheus, Grafana, ELK, Azure Monitor, Splunk) Proficiency in scripting (Python, Bash, PowerShell) Hands-on with Terraform, Ansible, or ARM templates Understanding of networking, OS (Linux), and distributed systems Experience in incident response and production support

Key Responsibilities

SRE/DevOps Engineer responsible for ensuring system reliability, scalability, and performance by combining software engineering with operations, automation, and continuous delivery practices. Key Responsibilities Design and manage highly available, scalable, and reliable systems Implement and maintain CI/CD pipelines for faster and stable releases Monitor system health using observability tools (metrics, logs, traces) Define and manage SLIs, SLOs, and SLAs Automate infrastructure using Infrastructure as Code (IaC) Perform incident management, root cause analysis (RCA), and problem resolution Optimize system performance, cost, and capacity planning Ensure system security, compliance, and resilience Collaborate with development teams to improve system design and reliability Drive automation to reduce manual intervention and improve efficiency Required Skills Strong experience with DevOps tools (Azure DevOps, Jenkins, GitLab CI/CD) Expertise in cloud platforms (Azure/AWS/GCP) Knowledge of containerization (Docker) and orchestration (Kubernetes) Experience in monitoring & logging tools (Prometheus, Grafana, ELK, Azure Monitor, Splunk) Proficiency in scripting (Python, Bash, PowerShell) Hands-on with Terraform, Ansible, or ARM templates Understanding of networking, OS (Linux), and distributed systems Experience in incident response and production support

Skill Requirements

SRE/DevOps Engineer responsible for ensuring system reliability, scalability, and performance by combining software engineering with operations, automation, and continuous delivery practices. Key Responsibilities Design and manage highly available, scalable, and reliable systems Implement and maintain CI/CD pipelines for faster and stable releases Monitor system health using observability tools (metrics, logs, traces) Define and manage SLIs, SLOs, and SLAs Automate infrastructure using Infrastructure as Code (IaC) Perform incident management, root cause analysis (RCA), and problem resolution Optimize system performance, cost, and capacity planning Ensure system security, compliance, and resilience Collaborate with development teams to improve system design and reliability Drive automation to reduce manual intervention and improve efficiency Required Skills Strong experience with DevOps tools (Azure DevOps, Jenkins, GitLab CI/CD) Expertise in cloud platforms (Azure/AWS/GCP) Knowledge of containerization (Docker) and orchestration (Kubernetes) Experience in monitoring & logging tools (Prometheus, Grafana, ELK, Azure Monitor, Splunk) Proficiency in scripting (Python, Bash, PowerShell) Hands-on with Terraform, Ansible, or ARM templates Understanding of networking, OS (Linux), and distributed systems Experience in incident response and production support

Other Requirements

SRE/DevOps Engineer responsible for ensuring system reliability, scalability, and performance by combining software engineering with operations, automation, and continuous delivery practices. Key Responsibilities Design and manage highly available, scalable, and reliable systems Implement and maintain CI/CD pipelines for faster and stable releases Monitor system health using observability tools (metrics, logs, traces) Define and manage SLIs, SLOs, and SLAs Automate infrastructure using Infrastructure as Code (IaC) Perform incident management, root cause analysis (RCA), and problem resolution Optimize system performance, cost, and capacity planning Ensure system security, compliance, and resilience Collaborate with development teams to improve system design and reliability Drive automation to reduce manual intervention and improve efficiency Required Skills Strong experience with DevOps tools (Azure DevOps, Jenkins, GitLab CI/CD) Expertise in cloud platforms (Azure/AWS/GCP) Knowledge of containerization (Docker) and orchestration (Kubernetes) Experience in monitoring & logging tools (Prometheus, Grafana, ELK, Azure Monitor, Splunk) Proficiency in scripting (Python, Bash, PowerShell) Hands-on with Terraform, Ansible, or ARM templates Understanding of networking, OS (Linux), and distributed systems Experience in incident response and production support

Benefits and perks

•Learning Budget

Required skills

Azure DevOps

Jenkins

Terraform

Docker

Kubernetes

Monitoring

Scripting