HCL Technologies
HCL Technologies

Senior Apache Spark Technical Lead - Scala, Python

RoleData Engineering
LevelLead
LocationAmnagal, India
WorkOn-site
TypeFull-time
Posted3 days ago
Apply now

About the role

Job Summary

Role Summary:

The Data Engineer is responsible for designing, building, and operating high-quality,
scalable, and reusable data services that support analytics, AI, and GenAI use cases
across business domains.
In this role, you will design and work hands-on with data pipelines, data models,
orchestration frameworks, storage layers, and observability tooling.
You will collaborate closely with AI Engineers, Data Scientists, Product Owners, and
Platform teams to deliver reliable, well-governed, and self-service data products.

Key Responsibilities

Key Responsibilities:

  • Data Platform & Services Engineering
  • Build and maintain scalable data pipelines and ingestion frameworks for batch,
    streaming, and event-driven data.
  • Develop and maintain modular data models and semantic layers optimized for
    analytics, BI self-service and AI use cases.
  • Implement and operate orchestration workflows (e.g., Databricks Workflows)
    and compute engines (Spark, SQL, Python).
  • Work with storage technologies such as Delta Lake, ADLS, feature and vector
    stores.
  • Data Quality, Governance & Observability
  • Implement data quality checks, validations, and monitoring to ensure reliability
    and trust in data products.
  • Contribute to data lineage, metadata management, and documentation.
  • Apply observability practices using tools such as Great Expectations or Monte
  • Carlo.
  • Ensure compliance with data governance standards and regulations (e.g., GDPR)
    in collaboration with data governance teams.
  • Enablement for AI & Analytics Use Cases
  • Deliver curated datasets and reusable data assets for analytics, machine
    learning, and GenAI applications.
  • Build pipelines that process structured, graph, and unstructured data (e.g., text,
    documents, images).
  • Support AI Engineering teams with data preparation for embeddings, vector
    stores, and retrieval-augmented generation (RAG) pipelines.
  • Tooling & Self-Service
  • Contribute to data engineering tooling and frameworks that enable e Sicient
    development and deployment of pipelines.
  • Develop data pipelines using tools such as dbt and Databricks Lakeflow.
  • Support reuse of data services through clear documentation, data contracts,
    templates, and examples.
  • Collaboration & Ways of Working
  • Collaborate with Data Scientists, AI Engineers, Product Owners, Business SMEs,
    and Platform teams.
  • Participate in technical design discussions, code reviews, and architecture
    forums.
  • Follow engineering best practices for version control, testing, CI/CD, and
    operational excellence.

Skill Requirements

  • Preferred Qualifications
  • 5+ years of experience in data engineering and building production-grade data
    pipelines.
  • Strong hands-on experience with data platforms such as Databricks.
  • Solid knowledge of data modeling, SQL, Spark, and Python.
  • Experience with orchestration frameworks, data quality tooling, and
    observability practices.
  • Exposure to unstructured data processing and AI/GenAI data pipelines is a
    strong plus.
  • Experience working in a global, multi-team environment is beneficial.
  • Success in This Role Means
  • Reliable, well-documented data products are available for analytics and AI use
    cases.
  • Data pipelines are scalable, cost-e Sicient, observable, and easy to operate.
  • Data engineers and AI teams can move faster using reusable patterns and selfservice
    data services.
  • Structured and unstructured data are e Sectively integrated to support advanced
    analytics and GenAI innovation.

Other Requirements

null

Benefits and perks

Learning Budget

Required skills

Technical leadership

System design

Troubleshooting

About HCL Technologies

Amnagal

Headquarters