Infosys

Python, PySpark, ETL Developer

RoleData Engineering

LevelMid Level

LocationHyderabad, India

WorkOn-site

TypeFull-time

Posted4 days ago

Apply now

About the role

Build and scale data solutions that power smarter decisions. In this role, you’ll work at the intersection of software engineering and data engineering—using Python, Py Spark, and ETL to transform raw, complex datasets into reliable, analytics-ready assets. You’ll collaborate closely with data engineers, analysts, and stakeholders to understand requirements, design efficient pipelines, and deliver high-quality outputs on time. If you enjoy solving performance challenges, improving data quality, and creating maintainable code that runs in production, this is a great opportunity to grow your impact. Expect a supportive, collaborative environment where ownership is encouraged, learning is continuous, and your contributions directly improve how teams access and trust data.

Data Pipeline Development
Develop and maintain scalable batch ETL pipelines using Python and Py Spark for data ingestion, transformation, and loading.
Implement reusable transformation logic, ensuring pipelines are modular, testable, and easy to maintain.
Optimize Spark jobs for performance (partitioning, caching, joins, shuffles) and cost efficiency.
Data Quality & Reliability
Apply data validation checks, handle schema evolution, and ensure accuracy and completeness of processed datasets.
Troubleshoot pipeline failures, analyze logs, and implement robust error handling and retry mechanisms.
Monitor job runs and support operational stability through alerts, runbooks, and timely incident resolution.
Collaboration & Delivery
Work with cross-functional teams to gather requirements, define data mappings, and deliver datasets aligned to business needs.
Participate in code reviews, follow engineering best practices, and contribute to continuous improvement of standards and tooling.
Document pipeline logic, dependencies, and operational procedures for smooth handovers and long-term maintainability.
Technology->Analytics
Packages->Python
Big Data,Technology->Big Data
Data Processing->Py Spark, ETL
Bachelor’s degree in Computer Science, Engineering, Information Systems, or a related field (or equivalent practical experience).
2–5 years of hands-on experience building data pipelines using Python and Py Spark.
Strong understanding of ETL concepts, data transformations, and handling large-scale datasets.
Proficiency in writing clean, maintainable code and debugging production issues.
Working knowledge of data structures, algorithms, and software development best practices.

Education: Bachelor of Engineering

Preferred skills: Technology->Analytics
Packages->Python
Big Data,Technology->Big Data
Data Processing->Py Spark

Benefits and perks

•Learning Budget

Required skills

Python

PySpark

ETL

SQL

About Infosys

Infosys

HYDERABAD

Headquarters