refresh

Trending companies

Trending companies

HCL Technologies
HCL Technologies

Senior Engineer - Monitoring Tools, Event Monitoring

RoleInfrastructure
LevelSenior
LocationLucknow, India
WorkRemote
TypeFull-time
Posted1 week ago
Apply now

About the role

Job Summary

Job Description: Azure Monitoring/Command Center Specialist

Position Overview

The Azure Monitoring and Command Center Specialist is responsible for the proactive monitoring, incident management, and operational support of enterprise cloud resources hosted on Microsoft Azure. This role is integral to ensuring system reliability, performance, and security by leveraging Azure-native tools and best practices within a dedicated command center environment.

Key Responsibilities

  • Monitor Azure cloud environments, including virtual machines, databases, networking, and applications, using Azure Monitor, Log Analytics, Application Insights, and other Azure-native tools.

  • Respond to system alerts and incidents in real-time, following established escalation procedures and using ITSM platforms for tracking and resolution.

  • Analyze logs and metrics to identify trends, potential issues, and root causes of incidents, collaborating with engineering and application teams for resolution.

  • Implement and maintain monitoring dashboards, alerts, and automated remediation scripts to ensure continuous visibility and rapid response.

  • Participate in on-call rotations, ensuring 24/7 coverage as required by business needs.

  • Document incidents, troubleshooting steps, and solutions, contributing to a knowledge base for continuous improvement.

  • Assist with regular system health checks, capacity planning, and performance tuning in the Azure environment.

  • Support the implementation of security best practices and compliance standards within the monitored Azure environment.

  • Collaborate with other IT teams to coordinate maintenance windows, change management, and disaster recovery exercises.

Required Skills and Qualifications

  • Bachelor’s degree Technology, or a related field, or equivalent practical experience.

  • 2+ years of experience in cloud operations, preferably with Microsoft Azure environments.

  • Hands-on experience with Azure Monitor, Log Analytics, Application Insights, and other Azure Operations tools.

  • Experience with incident management, ITSM tools (e.g., Service Now, Jira), and escalation procedures.

  • Excellent troubleshooting, analytical, and problem-solving skills.

  • Strong communication skills and ability to work effectively under pressure in a fast-paced environment.

  • Ability to work shifts or on-call as required.

Working Conditions

  • This is a full-time role based in a command center environment; remote or hybrid options may be available depending on organizational policy.

  • Require participation in 24/7 shift patterns or on-call rotations.

  • Occasional after-hours or weekend work may be required during critical incidents or planned maintenance.

Key Responsibilities

  1. Monitor IT infrastructure and applications using monitoring tools to promptly detect and escalate system events and incidents within defined protocols.

  2. Analyze system alerts and event logs using event monitoring platforms, identifying potential issues and initiating basic troubleshooting steps.

  3. Respond to and document incidents by updating ticketing systems, ensuring accurate status reporting and timely resolution of monitoring-related issues.

  4. Support the maintenance and configuration of monitoring tools by applying fundamental knowledge of event monitoring processes to ensure consistent system performance.

  5. Participate in team reviews of monitoring incidents and assist in implementing improvements based on recurring event patterns.

  6. Contribute to the creation and maintenance of operational documentation related to monitoring procedures and incident management.

Skill Requirements

  1. Experience With Monitoring Tools Such As Nagios, Solarwinds, Zabbix, Or Similar Platforms.

  2. Good Knowledge Of Incident Response Procedures In A Network Operations Center (Noc) Or Technical Operations Center (Toc) Environment.

  3. Familiarity With It Ticketing Systems And Basic Troubleshooting Techniques.

  4. Fundamental Knowledge Of It Infrastructure Components (Servers, Networks, Applications).

  5. Good Written And Verbal Communication Skills For Effective Incident Reporting.

Other Requirements

  1. Optional but valuable: CompTIA IT Operations Specialist (CIOS), CompTIA Network+ certification

Required skills

Monitoring Tools

Event Monitoring

About HCL Technologies

Lucknow

Headquarters