
Subject Matter Expert (Support&Ops)
About the role
Job Summary
Key Skills & Requirements
-
Strong hands-on experience in Grafana administration, including dashboard development, alert configuration, notification policies, RBAC, user management, and data source integration.
-
Expertise in Grafana plugin installation, configuration, troubleshooting, upgrades, and performance optimization across enterprise-scale monitoring environments.
-
Experience designing and maintaining observability solutions using Grafana Alloy, Grafana and Open Telemetry frameworks.
-
Hands-on experience with Grafana Alloy configuration, telemetry collection pipelines, log/metric forwarding, relabeling, filtering, and performance tuning.
-
Strong knowledge of Bind Plane administration, including collector deployment, gateway configuration, telemetry routing, load balancing, high availability, and troubleshooting.
-
Experience configuring and optimizing telemetry ingestion pipelines from on-premises and cloud-based infrastructure into centralized observability platforms.
-
Good understanding of Google Cloud Platform (GCP) services, with hands-on experience in GKE cluster administration, workload deployment, pod management, scaling, and troubleshooting.
-
Experience using Google Cloud Monitoring tools such as Metrics Explorer, Logs Explorer, dashboards, alerting policies, and observability best practices.
-
Strong Kubernetes administration skills, including deployments, services, ingress controllers, daemonsets, statefulsets, namespaces, resource management, and cluster troubleshooting.
-
Experience managing and monitoring Azure Kubernetes Service (AKS) environments and implementing observability solutions for containerized workloads.
-
Knowledge of Azure cloud services, networking concepts, identity management, and infrastructure monitoring.
-
Hands-on experience with Ansible for infrastructure automation, configuration management, deployment automation, and operational tasks.
-
Strong scripting and automation skills using Python and Shell Scripting for monitoring, API integrations, and operational efficiency improvements.
-
Experience integrating monitoring platforms with Service Now, REST APIs, webhook-based alerting, SQL , and third-party enterprise applications.
-
Strong understanding of Linux system administration, troubleshooting, process management, networking fundamentals, and performance analysis.
-
Ability to perform root cause analysis, capacity planning, performance optimization, and reliability improvements for large-scale monitoring platforms.
-
Experience supporting enterprise observability environments with thousands of monitored servers, applications, and cloud-native workloads.
-
Excellent analytical, troubleshooting, documentation, and stakeholder communication skills.
Cloud & Container Technologies
-
Google Cloud Platform (GCP)/Google Kubernetes Engine (GKE)
-
Kubernetes Administration
-
Azure Cloud/Azure Kubernetes Service (AKS)
Monitoring & Observability
-
Grafana
-
Grafana Alloy
-
Open Telemetry
-
Bind Plane
-
Cloud Monitoring
-
Log Management Solutions
-
Prometheus
Automation & Development
-
Python
-
Shell Scripting (Bash)
-
Ansible
-
REST APIs
-
Git/GitHub
Key Responsibilities
Key Skills & Requirements
-
Strong hands-on experience in Grafana administration, including dashboard development, alert configuration, notification policies, RBAC, user management, and data source integration.
-
Expertise in Grafana plugin installation, configuration, troubleshooting, upgrades, and performance optimization across enterprise-scale monitoring environments.
-
Experience designing and maintaining observability solutions using Grafana Alloy, Grafana and Open Telemetry frameworks.
-
Hands-on experience with Grafana Alloy configuration, telemetry collection pipelines, log/metric forwarding, relabeling, filtering, and performance tuning.
-
Strong knowledge of Bind Plane administration, including collector deployment, gateway configuration, telemetry routing, load balancing, high availability, and troubleshooting.
-
Experience configuring and optimizing telemetry ingestion pipelines from on-premises and cloud-based infrastructure into centralized observability platforms.
-
Good understanding of Google Cloud Platform (GCP) services, with hands-on experience in GKE cluster administration, workload deployment, pod management, scaling, and troubleshooting.
-
Experience using Google Cloud Monitoring tools such as Metrics Explorer, Logs Explorer, dashboards, alerting policies, and observability best practices.
-
Strong Kubernetes administration skills, including deployments, services, ingress controllers, daemonsets, statefulsets, namespaces, resource management, and cluster troubleshooting.
-
Experience managing and monitoring Azure Kubernetes Service (AKS) environments and implementing observability solutions for containerized workloads.
-
Knowledge of Azure cloud services, networking concepts, identity management, and infrastructure monitoring.
-
Hands-on experience with Ansible for infrastructure automation, configuration management, deployment automation, and operational tasks.
-
Strong scripting and automation skills using Python and Shell Scripting for monitoring, API integrations, and operational efficiency improvements.
-
Experience integrating monitoring platforms with Service Now, REST APIs, webhook-based alerting, SQL , and third-party enterprise applications.
-
Strong understanding of Linux system administration, troubleshooting, process management, networking fundamentals, and performance analysis.
-
Ability to perform root cause analysis, capacity planning, performance optimization, and reliability improvements for large-scale monitoring platforms.
-
Experience supporting enterprise observability environments with thousands of monitored servers, applications, and cloud-native workloads.
-
Excellent analytical, troubleshooting, documentation, and stakeholder communication skills.
Cloud & Container Technologies
-
Google Cloud Platform (GCP)/Google Kubernetes Engine (GKE)
-
Kubernetes Administration
-
Azure Cloud/Azure Kubernetes Service (AKS)
Monitoring & Observability
-
Grafana
-
Grafana Alloy
-
Open Telemetry
-
Bind Plane
-
Cloud Monitoring
-
Log Management Solutions
-
Prometheus
Automation & Development
-
Python
-
Shell Scripting (Bash)
-
Ansible
-
REST APIs
-
Git/GitHub
Skill Requirements
null
Other Requirements
null
Benefits and perks
•Learning Budget
About HCL Technologies
Sholinganallur
Headquarters