
Sr Advanced AI Data Engineer
About the role
As a Senior Advanced Data Engineer here at Honeywell, you will play a crucial role in designing, developing, and maintaining advanced data solutions that drive business insights and support decision-making processes. You will leverage your expertise in data engineering to build scalable data pipelines, optimize data storage, and ensure data quality and integrity.
Your ability to work with cross-functional teams and translate business requirements into technical solutions will be key to your success in this role.
In this role, you will impact the business by enabling data-driven decision-making, optimizing data processes, and improving overall data management. Your work will contribute to increased operational efficiency, cost savings, and enhanced customer satisfaction.
At Honeywell, our people leaders play a critical role in developing and supporting our employees to help them perform at their best and drive change across the company. Help to build a strong, diverse team by recruiting talent, identifying, and developing successors, driving retention and engagement, and fostering an inclusive culture.
YOU MUST HAVE
- Databricks: 4+ years hands-on: Py Spark, Delta Lake, Workflows, Unity Catalog.
- Demonstrate expertise in data strategy, for example: Medallion Architecture, Domain Data Modeling and Functional Data Architecture.
- Data Quality Frameworks (i.e. rule-based validation, anomaly detection)
- Data Pipelines: incremental loading, CDC, CI/CD, Observability
- Advanced Python/Pyspark and Advanced SQL
- Strongly preferred: DLT, UC, GCP, Azure, Kafka.
- Highly value Databricks Certified Professional
- 7+ years of overall data engineering experience
- 4+ years of hands-on Azure Databricks experience in production environments
- Proven experience building platforms, not just maintaining them: greenfield builds, migrations, framework development
- Experience with financial, engineering, enterprise, or industrial-scale datasets preferred
- Demonstrated ability to own technical decisions end-to-end: from architecture to production deployment
#LI**-Hybrid
AI-Ready Data Platform
- Design and implement end-to-end ingestion pipelines from heterogeneous sources: including Snowflake, SQL Server, Excel, REST APIs, and unstructured data: into Azure Databricks
- Architect and enforce Medallion Architecture (Bronze → Silver → Gold) ensuring data arrives clean, validated, and fit for purpose at each layer
- Build Delta Live Tables (DLT) pipelines with declarative data quality expectations, schema evolution, and automated lineage tracking
- Implement incremental loading patterns using CDC (Change Data Capture), watermarking, and Delta Lake MERGE/UPSERT for efficient, scalable ingestion
- Enable structured and unstructured data processing: documents, Excel files, JSON, Parquet : building the foundation for AI and ML consumption
Data Modeling & Semantic Layer
- Design and implement the Engineering data model: dimensional models, fact/dimension tables, and domain-specific data marts: serving analytics, BI, ML and AI use cases
- Build a governed, reusable semantic layer on top of the Gold layer, enabling self-service analytics through Power BI and GCP-connected consumers
- Ensure data models are documented, versioned, and aligned to business domains within the VECE COE
Orchestration and Data Ops
- Build and manage Databricks Workflows with multi-task dependencies, SLA monitoring, retry logic, and alerting
- Implement CI/CD pipelines for Databricks using Azure DevOps and GitHub Actions : including Python Wheel packaging for reusable utility libraries deployed across the platform
- Apply software engineering best practices: version control, unit testing, modular code design, and automated deployment to Dev/QA/Prod environments
- Cluster right-sizing, DBU management, Delta table optimization (VACUUM, compaction), cost monitoring across Azure Databricks and GCP
Data Governance & Quality
- Implement and manage Unity Catalog for centralized data governance: three-level namespace (catalog → schema → table), fine-grained RBAC, data masking, and audit logging
- Build data quality frameworks: rule-based validation, deduplication, reconciliation, and anomaly detection: ensuring data arrives fit for AI/ML consumption
- Establish data lineage tracking across ingestion, transformation, and serving layers
- Govern data delivery to GCP: ensuring secure, validated, schema-consistent outputs consumed by downstream data science and analytics teams
AI & Proactive Analytics Foundation
- Design pipelines that are AI-ready from day one: supporting structured ML feature pipelines, embedding generation, and future Vector DB integrations
- Build the data infrastructure that enables the shift from descriptive dashboards to proactive, predictive analytics
- Collaborate with Data Scientists and Analytics Engineers to ensure the Gold layer supports model training, feature stores, and real-time inference pipelines
Required skills
Data engineering
Data pipelines
Data storage
Data quality
Analytics support
About Honeywell
Monterrey
Headquarters