
Senior Technical Lead
About the role
Job Summary
GenAI with Copilot 1) SM AI / AIOps Capabilities (Core Focus) Provide predictive insights & anomaly detection across infrastructure and platform signals (metrics/logs/traces/events/tickets), proactively identifying emerging risks and degradation trends. Design and implement AIOps patterns such as: Signal correlation (event + topology + service mapping) Noise reduction (duplication, suppression, alert rationalization) Incident clustering and “probable cause” identification Change risk signals (change-related anomaly detection, blast radius indicators) Identify and prioritize automation opportunities: Auto-triage enrichment (context injection into incidents) Auto-routing suggestions aligned to service ownership Auto-remediation for repeatable patterns with guardrails and human-in-the-loop approvals Build and maintain insight pipelines that link operational telemetry with ITSM data to produce actionable outcomes for: Major Incident (MI) early warning Incident prevention Faster restoration through decision support 2) Analytics & Governance Dashboards (CC / NOC / SM) Build and operationalize dashboards and governance views for CC/NOC/SM leaders, including: Service health and early warnings Alert volume trends, noise ratio, top talkers MTTA/MTTR drivers, recurring patterns MI leading indicators and “near miss” signals Automation impact metrics (time saved, repeat reduction) Create operational KPI packs with clear storylines for governance forums (weekly/monthly), enabling data-led decisions and prioritization. 3) Problem Intelligence & Shift‑Left Enablement Analyze patterns across incidents, alerts, and changes to: Identify Problem themes, recurring failure modes, and top drivers Generate candidate Known Errors, workaround suggestions, and knowledge articles Recommend shift-left opportunities (L1/L2 enablement) by converting patterns into reusable diagnostics and guided actions Partner with Problem, Change, and Service Owners to ensure insights translate into: Problem records with evidence Engineering backlog items Change governance improvements Reduced repeat incidents 4) GenAI / Agentic Automation for Operations (Copilots & Assistants) Develop GenAI copilots/agents to support operational workflows: Incident summarization (telemetry + ticket history + change context) Suggested next-best actions and diagnostic steps Automated enrichment and knowledge retrieval (RAG) Runbook guidance and workflow orchestration for repeated tasks Build Retrieval-Augmented Generation (RAG) systems using internal knowledge sources (K
Key Responsibilities
null
Skill Requirements
Technology Skills
Required hands-on experience includes:
-
Python — ability to build scalable, production-grade services and analytics pipelines.
-
AIOps & Observability Analytics
-
Working with telemetry (metrics/logs/traces/events)
-
Practical exposure to orchestration frameworks (e.g., Lang Chain/Lang Graph/CrewAI or similar)
-
Anomaly detection approaches (statistical + ML-based)
-
Correlation techniques (time-series + topology/context)
-
Alert deduplication / suppression / classification
-
Data & Analytics Engineering
-
Data modelling for operational datasets (ITSM + telemetry)
-
SQL and/or equivalent querying capability
-
Dashboard development (BI and/or observability dashboards)
-
Understanding reasoning patterns and safe operationalization
-
Tool-use, verification layers, guardrails, human approvals
-
GenAI / RAG for Operational Knowledge
-
Building RAG pipelines, embeddings, vector search concepts
-
Evaluation approaches (grounding, accuracy, hallucination reduction)
-
RAG systems using internal knowledge sources (runbooks, postmortems, KEDB)
-
Integration & Automation
-
API integration and enterprise workflow integration patterns
-
Automation frameworks / orchestration basics (human-in-loop controls)
-
Designing assistants/agents to support incident triage, diagnostics, summarization, and enrichment
Leadership & Behavioural
-
Partner across Infrastructure, CC/NOC, Service Management, Product/Engineering, Security to deliver operational outcomes.
-
Strong stakeholder engagement — able to communicate complex insights clearly to senior stakeholders.
-
Pragmatic execution under ambiguity; proactive, outcome-driven delivery.
-
Proficient in verbal and written English, with the ability to communicate comfortably with senior management and stakeholders.
Good To Have
· Experience with AIOps platforms and/or enterprise observability tooling (any major platform acceptable).
· Familiarity with ITSM data structures (Incidents/Problems/Changes, categorisation, routing, SLAs).
· React + JavaScript (for lightweight UIs for operational assistants).
· Data Structures & Algorithms (intermediate foundations).
Qualifications
-
Graduation or Post Graduation.
-
Experience building analytics/AI solutions for operations (NOC/CC/ITSM) preferred..
-
3-5 years of hands-on experience across AI/nalytics stacks (incl. Gen AI exposure)
-
Overall 8-10 years of experience.
Other Requirements
null
Required skills
Technical leadership
About HCL Technologies
Bengaluru
Headquarters