
Python, Pyspark Developer
About the role
Build and scale data-driven solutions that power smarter decisions. In this role, you’ll design and deliver high-performance data processing pipelines using Python and Py Spark, working closely with data engineers, analysts, and product teams to turn raw data into reliable, actionable insights. You’ll contribute to a collaborative environment where clean code, thoughtful design, and continuous improvement are valued. If you enjoy solving complex data challenges, optimizing distributed workloads, and delivering production-ready systems that make a real impact, this is a great opportunity to grow your expertise while helping teams move faster with trustworthy data.
-
Design, develop, and maintain scalable batch/stream data pipelines using Python and Py Spark in distributed environments.
-
Implement efficient transformations, aggregations, and joins on large datasets while ensuring performance and cost optimization.
-
Write optimized SQL for data extraction, validation, and reconciliation across multiple sources.
-
Build reusable, testable modules and follow engineering best practices (code reviews, unit testing, documentation).
-
Troubleshoot production issues, perform root-cause analysis, and implement long-term fixes and monitoring improvements.
-
Collaborate with stakeholders to translate requirements into technical designs, delivery plans, and measurable outcomes.
-
Ensure data quality through validation checks, anomaly detection patterns, and consistent schema management.
-
Contribute to continuous improvement of development standards, performance benchmarks, and pipeline reliability.
-
Technology->Analytics
-
Packages->Python
-
Big Data,Technology->Big Data
-
Data Processing->Py Spark
-
Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
-
5–9 years of hands-on experience in software development and/or data engineering roles.
-
Strong proficiency in Python with experience building production-grade applications or data workflows.
-
Strong proficiency in Py Spark, including Data Frame APIs, optimization techniques, and distributed processing concepts.
-
Working knowledge of SQL for complex queries, data analysis, and validation.
-
Experience delivering reliable solutions with attention to performance, scalability, and maintainability.
Education: Bachelor of Engineering
- Preferred skills: Technology->Analytics
- Packages->Python
- Big Data,Technology->Big Data
- Data Processing->Py Spark
Benefits and perks
•Learning Budget
Required skills
Python
PySpark
SQL
Data pipelines
About Infosys
HYDERABAD
Headquarters