HCL Technologies

Azure Data Lead

RoleData Engineering

LevelLead

LocationNoida, India

WorkOn-site

TypeFull-time

Posted2 days ago

Apply now

About the role

Job Summary

The core responsibilities that define a Databricks engineer role span pipeline design, Spark optimization, Delta governance, automation, cost, and reliability. These databricks engineer roles responsibilities guide hiring criteria and scope.

1. Lakehouse pipeline design

End-to-end ingestion, transformation, and modeling across bronze, silver, and gold layers on Delta Lake.
Modular data assets standardize reuse, accelerate delivery, and reduce maintenance risk.
Design patterns align with medallion architecture, data contracts, and versioned outputs.
Reliability improves through idempotent jobs, retries, and checkpointed stages.
Implement CDC, batch, and streaming paths with schema evolution and validation.
Promote notebooks and code to Jobs with parameterization and environment parity.

2. Spark job tuning and optimization

Techniques for partitioning, caching, and efficient join strategies within Spark.
Performance gains cut cost, shrink runtimes, and raise SLA confidence.
Manage shuffle, skew, and file sizes using adaptive query execution and hints.
Benchmarks validate settings for executor sizing, autoscaling, and AQE thresholds.
Apply broadcast joins, Z-ordering, and compaction to reduce I/O overhead.
Profiling with Spark UI and Ganglia guides targeted fixes over guesswork.

3. Delta Lake governance and data reliability

ACID transactions, time travel, and schema enforcement across tables.
Consistency protects critical analytics, ML features, and regulatory reporting.
Use constraints, expectations, and optimize commands for stable datasets.
Retention and vacuum policies balance performance with compliance needs.
Handle DLT expectations and deduplication to maintain data integrity.
Recovery via restore points and transaction logs reduces incident impact.

4. Workflow orchestration and automation

Coordinated Jobs, tasks, and triggers across batch and streaming workloads.
Orchestration reduces manual effort and enforces dependency order.
Compose tasks with task orchestration, dbutils widgets, and job clusters.
External schedulers integrate via REST APIs, Airflow, or cloud-native tools.
Parameterize runs for environments, secrets, and tenants with consistent naming.
Notifications, retries, and SLAs codify operational discipline end-to-end.

Key Responsibilities

Lead and manage end-to-end data engineering projects using azure data factory, azure databricks, sql, oracle pl/sql, and python.
Collaborate with stakeholders to gather and understand requirements for data pipelines and analytics solutions.
Design and develop etl processes, data models, and data integration solutions.
Provide technical guidance and mentorship to the team members.
Ensure data quality, data governance, and data security standards are maintained throughout the project lifecycle.
Troubleshoot and optimize data pipelines and processes for performance and efficiency.
Stay updated on the latest trends and technologies in data engineering and contribute to continuous improvement efforts.

Skill Requirements

Proficiency in azure data factory (adf) and azure databricks for building and managing data pipelines.
Strong experience with sql and oracle pl/sql for data querying and manipulation.
Advanced programming skills in python for scripting and data processing tasks.
Knowledge of data modeling, data warehousing concepts, and database design principles.
Ability to work in a collaborative team environment and communicate effectively with stakeholders.
Strong analytical and problem-solving skills with attention to detail.
Experience in data visualization tools and techniques is a plus.

Other Requirements

1.Relevant certifications in Azure Data Factory, Azure Databricks, SQL, Oracle PL/SQL, or Python are advantageous.

Required skills

Databricks

Spark

Delta Lake

Azure

SQL

About HCL Technologies

HCL Technologies

Noida

Headquarters