
Apache Spark Technical Lead - Scala, Python
About the role
Job Summary
Job Title: Data Engineer
Department: Global Analytics
Reports to: Manager AI Engineering
Level: Senior / Medior (depending on experience)
Role Summary:
The Data Engineer is responsible for designing, building, and operating high-quality,
scalable, and reusable data services that support analytics, AI, and GenAI use cases
across business domains.
In this role, you will design and work hands-on with data pipelines, data models,
orchestration frameworks, storage layers, and observability tooling.
You will collaborate closely with AI Engineers, Data Scientists, Product Owners, and
Platform teams to deliver reliable, well-governed, and self-service data products.
Key Responsibilities
Key Responsibilities:
- Data Platform & Services Engineering
- Build and maintain scalable data pipelines and ingestion frameworks for batch,
streaming, and event-driven data. - Develop and maintain modular data models and semantic layers optimized for
analytics, BI self-service and AI use cases. - Implement and operate orchestration workflows (e.g., Databricks Workflows)
and compute engines (Spark, SQL, Python). - Work with storage technologies such as Delta Lake, ADLS, feature and vector
stores. - Data Quality, Governance & Observability
- Implement data quality checks, validations, and monitoring to ensure reliability
and trust in data products. - Contribute to data lineage, metadata management, and documentation.
- Apply observability practices using tools such as Great Expectations or Monte
- Carlo.
- Ensure compliance with data governance standards and regulations (e.g., GDPR)
in collaboration with data governance teams. - Enablement for AI & Analytics Use Cases
- Deliver curated datasets and reusable data assets for analytics, machine
learning, and GenAI applications. - Build pipelines that process structured, graph, and unstructured data (e.g., text,
documents, images). - Support AI Engineering teams with data preparation for embeddings, vector
stores, and retrieval-augmented generation (RAG) pipelines. - Tooling & Self-Service
- Contribute to data engineering tooling and frameworks that enable e Sicient
development and deployment of pipelines. - Develop data pipelines using tools such as dbt and Databricks Lakeflow.
- Support reuse of data services through clear documentation, data contracts,
templates, and examples. - Collaboration & Ways of Working
- Collaborate with Data Scientists, AI Engineers, Product Owners, Business SMEs,
and Platform teams. - Participate in technical design discussions, code reviews, and architecture
forums. - Follow engineering best practices for version control, testing, CI/CD, and
operational excellence.
Skill Requirements
- Preferred Qualifications
- 5+ years of experience in data engineering and building production-grade data
pipelines. - Strong hands-on experience with data platforms such as Databricks.
- Solid knowledge of data modeling, SQL, Spark, and Python.
- Experience with orchestration frameworks, data quality tooling, and
observability practices. - Exposure to unstructured data processing and AI/GenAI data pipelines is a
strong plus. - Experience working in a global, multi-team environment is beneficial.
- Success in This Role Means
- Reliable, well-documented data products are available for analytics and AI use
cases. - Data pipelines are scalable, cost-e Sicient, observable, and easy to operate.
- Data engineers and AI teams can move faster using reusable patterns and selfservice
data services. - Structured and unstructured data are e Sectively integrated to support advanced
analytics and GenAI innovation.
Other Requirements
null
Benefits and perks
•Learning Budget
Required skills
Technical leadership
System design
Troubleshooting
About HCL Technologies
Amangal
Headquarters