HCL Technologies
HCL Technologies

Azure Data Lead

RoleData Engineering
LevelSenior
LocationNoida, India
WorkOn-site
TypeFull-time
Posted1 week ago
Apply now

About the role

Job Summary

The core responsibilities that define a Databricks engineer role span pipeline design, Spark optimization, Delta governance, automation, cost, and reliability. These databricks engineer roles responsibilities guide hiring criteria and scope.1. Lakehouse pipeline design End-to-end ingestion, transformation, and modeling across bronze, silver, and gold layers on Delta Lake.Modular data assets standardize reuse, accelerate delivery, and reduce maintenance risk.Design patterns align with medallion architecture, data contracts, and versioned outputs.Reliability improves through idempotent jobs, retries, and checkpointed stages.Implement CDC, batch, and streaming paths with schema evolution and validation.Promote notebooks and code to Jobs with parameterization and environment parity.2. Spark job tuning and optimization Techniques for partitioning, caching, and efficient join strategies within Spark.Performance gains cut cost, shrink runtimes, and raise SLA confidence.Manage shuffle, skew, and file sizes using adaptive query execution and hints.Benchmarks validate settings for executor sizing, autoscaling, and AQE thresholds.Apply broadcast joins, Z-ordering, and compaction to reduce I/O overhead.Profiling with Spark UI and Ganglia guides targeted fixes over guesswork.3. Delta Lake governance and data reliabilityACID transactions, time travel, and schema enforcement across tables.Consistency protects critical analytics, ML features, and regulatory reporting.Use constraints, expectations, and optimize commands for stable datasets.Retention and vacuum policies balance performance with compliance needs.Handle DLT expectations and deduplication to maintain data integrity.Recovery via restore points and transaction logs reduces incident impact.4. Workflow orchestration and automation Coordinated Jobs, tasks, and triggers across batch and streaming workloads.Orchestration reduces manual effort and enforces dependency order.Compose tasks with task orchestration, dbutils widgets, and job clusters.External schedulers integrate via REST APIs, Airflow, or cloud-native tools.Parameterize runs for environments, secrets, and tenants with consistent naming.Notifications, retries, and SLAs codify operational discipline end-to-end.

Key Responsibilities

null

Skill Requirements

null

Other Requirements

null

Required skills

Databricks

Spark

Delta Lake

Azure

Pipeline Design

Data Governance

Python

About HCL Technologies

Noida

Headquarters