
Data Engineer
About the role
Career Category
Information Systems:
Job Description
As a Data Engineer supporting Law data strategy, you will design, build, and maintain scalable data pipelines that integrate data from legal systems into Amgen’s enterprise data fabric.
You will enable high-quality, governed datasets that support analytics, reporting, and emerging AI/ML use cases for Legal and Compliance teams.
This role requires strong hands-on engineering skills, familiarity with modern data platforms (e.g., Databricks), and the ability to work closely with Legal stakeholders, Data Architects, and AI/Analytics teams.
Key Responsibilities Data Engineering & Pipeline Development
- Design, develop, and maintain data pipelines to ingest data from legal systems, third-party tools, and enterprise platforms
- Build and optimize ETL/ELT pipelines using modern frameworks (Databricks, Spark)
- Implement reliable, scalable, and production-ready data pipelines using engineering best practices, monitoring, and automated validation frameworks
- Integrate structured and unstructured legal data into the enterprise data fabric
- Ensure reliability, scalability, and performance of data pipelines
Databricks & Modern Data Platform
- Develop pipelines using Databricks (Delta Lake, Spark, notebooks)
- Implement data transformation and orchestration workflows
- Support migration and modernization of legacy data solutions to cloud-native platforms
- Contribute to reusable data engineering patterns and components
- Optimize Delta Lake and Spark workloads for scalable, cost-efficient, and high-performance enterprise data processing
Data Quality, Governance & Compliance
- Implement data quality checks, validation rules, and monitoring
- Implement governance, lineage, and security controls for sensitive legal and compliance datasets
- Ensure compliance with data governance, privacy, and legal/regulatory requirements (e.g., sensitive legal data handling)
- Maintain metadata, lineage, and documentation for legal datasets
AI & Advanced Analytics Enablement
- Build curated datasets that support AI/ML models and GenAI use cases
- Prepare structured and unstructured datasets for AI/ML and GenAI use cases including document intelligence and semantic search applications
- Enable feature engineering and data preparation for AI applications in Legal (e.g., document analysis, contract insights)
- Collaborate with data scientists and AI teams to ensure data readiness and accessibility
Collaboration & Delivery
- Work with Legal stakeholders to understand data needs and translate into technical solutions
- Partner with Data Architects to align with enterprise data fabric strategy
- Participate in Agile development processes (sprint planning, estimation, delivery)
- Document pipelines, models, and technical decisions
Basic Qualifications
- Master's or Bachelor’s degree in Computer Science, Engineering, Information Systems, or related field
- 5–8 years of experience in data engineering or related technical role
Must-Have Technical Skills
-
Strong experience with SQL and relational databases
-
Programming experience in Python (required), Py Spark preferred
-
Hands-on experience with Databricks / Apache Spark
-
Experience building ETL/ELT pipelines for large-scale datasets
-
Familiarity with cloud platforms (AWS, Azure, or GCP)
-
Understanding of data modeling and data warehousing concepts Preferred / Strategic Skills (Aligned to Future Data Strategy)
-
Certification:
Relevant certifications in Databricks, cloud platforms (AWS/Azure/GCP), or modern data engineering technologies are a plus -
Experience with:
Delta Lake / Lakehouse architectures -
Data Fabric / Data Mesh concepts
-
Snowflake, Redshift, or enterprise data warehouse platforms
-
Familiarity with:
Streaming data (Kafka, event-driven pipelines) -
Data orchestration tools (Airflow, Databricks Workflows)
-
Exposure to:
AI/ML data pipelines and feature engineering -
Unstructured data processing (documents, legal text)
-
Understanding of:
Data governance frameworks and cataloging tools -
Security and privacy controls for sensitive data (legal/compliance)Functional Skills
-
Strong problem-solving and analytical thinking
-
Ability to work with large, complex datasets
-
Effective communication with both technical and non-technical stakeholders
-
Ability to operate in a fast-paced Agile environment
.
Benefits and perks
•Learning Budget
Required skills
Data pipelines
Databricks
Spark
ETL/ELT
Data governance
About Amgen
India - Hyderabad
Headquarters