
Azure Data Lead
About the role
Job Summary
The core responsibilities that define a Databricks engineer role span pipeline design, Spark optimization, Delta governance, automation, cost, and reliability. These databricks engineer roles responsibilities guide hiring criteria and scope.1. Lakehouse pipeline design End-to-end ingestion, transformation, and modeling across bronze, silver, and gold layers on Delta Lake.Modular data assets standardize reuse, accelerate delivery, and reduce maintenance risk.Design patterns align with medallion architecture, data contracts, and versioned outputs.Reliability improves through idempotent jobs, retries, and checkpointed stages.Implement CDC, batch, and streaming paths with schema evolution and validation.Promote notebooks and code to Jobs with parameterization and environment parity.2. Spark job tuning and optimization Techniques for partitioning, caching, and efficient join strategies within Spark.Performance gains cut cost, shrink runtimes, and raise SLA confidence.Manage shuffle, skew, and file sizes using adaptive query execution and hints.Benchmarks validate settings for executor sizing, autoscaling, and AQE thresholds.Apply broadcast joins, Z-ordering, and compaction to reduce I/O overhead.Profiling with Spark UI and Ganglia guides targeted fixes over guesswork.3. Delta Lake governance and data reliabilityACID transactions, time travel, and schema enforcement across tables.Consistency protects critical analytics, ML features, and regulatory reporting.Use constraints, expectations, and optimize commands for stable datasets.Retention and vacuum policies balance performance with compliance needs.Handle DLT expectations and deduplication to maintain data integrity.Recovery via restore points and transaction logs reduces incident impact.4. Workflow orchestration and automation Coordinated Jobs, tasks, and triggers across batch and streaming workloads.Orchestration reduces manual effort and enforces dependency order.Compose tasks with task orchestration, dbutils widgets, and job clusters.External schedulers integrate via REST APIs, Airflow, or cloud-native tools.Parameterize runs for environments, secrets, and tenants with consistent naming.Notifications, retries, and SLAs codify operational discipline end-to-end.
Key Responsibilities
null
Skill Requirements
null
Other Requirements
null
Required skills
Databricks
Spark
Delta Lake
Azure
Pipeline Design
Data Governance
Python
About HCL Technologies
Noida
Headquarters