
Azure Data Lead
About the role
Job Summary
The core responsibilities that define a Databricks engineer role span pipeline design, Spark optimization, Delta governance, automation, cost, and reliability. These databricks engineer roles responsibilities guide hiring criteria and scope.
1. Lakehouse pipeline design
-
End-to-end ingestion, transformation, and modeling across bronze, silver, and gold layers on Delta Lake.
-
Modular data assets standardize reuse, accelerate delivery, and reduce maintenance risk.
-
Design patterns align with medallion architecture, data contracts, and versioned outputs.
-
Reliability improves through idempotent jobs, retries, and checkpointed stages.
-
Implement CDC, batch, and streaming paths with schema evolution and validation.
-
Promote notebooks and code to Jobs with parameterization and environment parity.
2. Spark job tuning and optimization
-
Techniques for partitioning, caching, and efficient join strategies within Spark.
-
Performance gains cut cost, shrink runtimes, and raise SLA confidence.
-
Manage shuffle, skew, and file sizes using adaptive query execution and hints.
-
Benchmarks validate settings for executor sizing, autoscaling, and AQE thresholds.
-
Apply broadcast joins, Z-ordering, and compaction to reduce I/O overhead.
-
Profiling with Spark UI and Ganglia guides targeted fixes over guesswork.
3. Delta Lake governance and data reliability
-
ACID transactions, time travel, and schema enforcement across tables.
-
Consistency protects critical analytics, ML features, and regulatory reporting.
-
Use constraints, expectations, and optimize commands for stable datasets.
-
Retention and vacuum policies balance performance with compliance needs.
-
Handle DLT expectations and deduplication to maintain data integrity.
-
Recovery via restore points and transaction logs reduces incident impact.
4. Workflow orchestration and automation
-
Coordinated Jobs, tasks, and triggers across batch and streaming workloads.
-
Orchestration reduces manual effort and enforces dependency order.
-
Compose tasks with task orchestration, dbutils widgets, and job clusters.
-
External schedulers integrate via REST APIs, Airflow, or cloud-native tools.
-
Parameterize runs for environments, secrets, and tenants with consistent naming.
-
Notifications, retries, and SLAs codify operational discipline end-to-end.
Key Responsibilities
-
Lead and manage end-to-end data engineering projects using azure data factory, azure databricks, sql, oracle pl/sql, and python.
-
Collaborate with stakeholders to gather and understand requirements for data pipelines and analytics solutions.
-
Design and develop etl processes, data models, and data integration solutions.
-
Provide technical guidance and mentorship to the team members.
-
Ensure data quality, data governance, and data security standards are maintained throughout the project lifecycle.
-
Troubleshoot and optimize data pipelines and processes for performance and efficiency.
-
Stay updated on the latest trends and technologies in data engineering and contribute to continuous improvement efforts.
Skill Requirements
-
Proficiency in azure data factory (adf) and azure databricks for building and managing data pipelines.
-
Strong experience with sql and oracle pl/sql for data querying and manipulation.
-
Advanced programming skills in python for scripting and data processing tasks.
-
Knowledge of data modeling, data warehousing concepts, and database design principles.
-
Ability to work in a collaborative team environment and communicate effectively with stakeholders.
-
Strong analytical and problem-solving skills with attention to detail.
-
Experience in data visualization tools and techniques is a plus.
Other Requirements
1.Relevant certifications in Azure Data Factory, Azure Databricks, SQL, Oracle PL/SQL, or Python are advantageous.
Required skills
Databricks
Spark
Delta Lake
Azure
SQL
About HCL Technologies
Noida
Headquarters