HCL Technologies
HCL Technologies

SeniorAdministrator - Monitoring Tools, Event Monitoring

RoleIt Helpdesk
LevelSenior
LocationMexico
WorkOn-site
TypeFull-time
Posted3 days ago
Apply now

About the role

Job Summary

Lead SAP operational monitoring and observability initiatives to ensure 24x7 system availability, proactive issue detection, and improved service reliability across SAP landscapes (ECC, S/4, HANA, PI/PO, BO, interfaces).

E2.2 – SAP Operational Monitoring Lead (JD)\\r\\n1. Job Summary\\r\\n Lead SAP operational monitoring and observability initiatives to ensure 24x7 system availability, proactive issue detection, and improved service reliability across SAP landscapes (ECC, S/4, HANA, PI/PO, BO, interfaces).\\r\\n Drive centralized monitoring governance, alert optimization, and incident prevention through tools such as SAP Sol Man, Moogsoft, and AIOps platforms\\r\\n\\r\\n2. Key Responsibilities\\r\\nA. Monitoring Operations & Governance\\r\\n• Own end-to-end SAP monitoring (Availability, Performance, Interface, Batch jobs)\\r\\n• Ensure proactive alerting and incident detection across SAP landscapes\\r\\n• Define and maintain monitoring SOPs, runbooks, and escalation matrix\\r\\n• Ensure adherence to SLAs, OLAs, and operational KPIs\\r\\n Internal reference: Monitoring stabilization and alignment across tools like Sol Man & Moogsoft has been a key operational focus\\r\\n________________________________________\\r\\nB. Alert Management & Optimization\\r\\n• Analyze alerts to identify noise, redundancies, and false positives\\r\\n• Drive alert rationalization, correlation, and threshold tuning\\r\\n• Implement event correlation & AIOps-driven improvements\\r\\n• Reduce alert volume and improve MTTR through automation\\r\\n Internal insight: SAP alert volumes are high (~16k+/year) with optimization opportunities via automation and correlation. \\r\\n________________________________________\\r\\nC. Incident & Problem Management\\r\\n• Lead monitoring-driven incident triage and ensure quick resolution\\r\\n• Identify recurring patterns and support Root Cause Analysis (RCA)\\r\\n• Work with L2/L3 teams to drive permanent fixes and elimination of repeat incidents\\r\\n• Support Major Incident Management (MIM) bridge calls\\r\\n\\r\\nD. Platform & Tool Management\\r\\n• Manage SAP monitoring tools: \\r\\no SAP Solution Manager (Sol Man)\\r\\no Moogsoft / AIOps tools\\r\\no Interface monitoring tools\\r\\n• Maintain dashboards for: \\r\\no Real-time health monitoring\\r\\no Performance metrics\\r\\no Availability reporting\\r\\n Internal practice includes centralized dashboards and monitoring metric expansion\\r\\n\\r\\nE. Automation & Continuous Improvement\\r\\n• Drive monitoring automation initiatives (auto-remediation, self-healing)\\r\\n• Implement: \\r\\no Predictive alerting\\r\\no Automated first response actions\\r\\n• Improve operational maturity from reactive → proactive → predictive monitoring\\r\\n Internal transformation roadmap includes AIOps and autonomous operations maturity. \\r\\n________________________________________\\r\\nF. Stakeholder & Leadership Reporting\\r\\n• Provide regular operational insights to leadership\\r\\n• Highlight: \\r\\no Risks\\r\\no Trends\\r\\no Improvement opportunities\\r\\n• Coordinate with: \\r\\no SAP Basis\\r\\no Infra teams\\r\\no Application teams\\r\\n________________________________________\\r\\nG. Team Leadership (E2.2 scope)\\r\\n• Act as shift/track lead for monitoring operations\\r\\n• Guide L1/L2 teams on monitoring best practices\\r\\n• Drive knowledge transfer and skill improvement\\r\\n________________________________________\\r\\n3. Required Ski

Key Responsibilities

Own end-to-end SAP monitoring (Availability, Performance, Interface, Batch jobs) • Ensure proactive alerting and incident detection across SAP landscapes • Define and maintain monitoring SOPs, runbooks, and escalation matrix • Ensure adherence to SLAs, OLAs, and operational KPIs

Skill Requirements

SAP monitoring tools (Sol Man / CCMS / Focused Run) • SAP Basis fundamentals (HANA, ECC, S/4) • Monitoring tools (SLACK / Moogsoft / Dynatrace / Splunk – preferred) • Incident & problem management tools (Service Now)

Other Requirements

Incident management & RCA • Alert tuning & noise reduction • SLA/KPI tracking & reporting • Automation mindset (AIOps preferred)

Benefits and perks

Learning Budget

Required skills

Systems administration

Troubleshooting

Service operations

About HCL Technologies

Others

Headquarters