
Python and PySpark Developer
About the role
We are seeking a motivated and detail‑oriented Python / Py Spark Developer to support the development and maintenance of scalable data processing solutions. The ideal candidate should have foundational experience in Python and exposure to Apache Spark (Py Spark), along with a strong willingness to learn and grow in a distributed data engineering environment.
You will work under the guidance of senior engineers and collaborate with data teams to build reliable data pipelines and contribute to analytics and reporting solutions.
Key Responsibilities
Development & Engineering
- Assist in developing and maintaining data pipelines using Python and Py Spark
- Support ETL/ELT workflows for batch data processing
- Write clean, readable, and well‑structured Python code following best practices
- Perform basic data transformations, aggregations, and validations
- Debug and troubleshoot pipeline issues with guidance from senior developers
Data & Platform
- Work with structured and semi‑structured data formats (CSV, JSON, Parquet, etc.)
- Assist in integrating data from databases, APIs, and cloud storage systems
- Help ensure data quality and consistency within pipelines
- Support migration of legacy scripts to modern data platforms
Learning & Collaboration
- Collaborate with team members on development tasks and code reviews
- Participate in knowledge‑sharing and training sessions
- Learn and adopt new tools, frameworks, and best practices
- Assist in documenting data workflows and technical processes
Required Skills & Qualifications
Technical Skills
-
Basic to intermediate proficiency in Python
-
4 -7 years of experience
-
Exposure to Apache Spark / Py Spark (internship or project experience is acceptable)
-
Understanding of fundamental programming and data structures
-
Basic knowledge of SQL and relational databases
-
Familiarity with data processing concepts and ETL fundamentals
-
Awareness of Linux/Unix command line is a plus
Engineering Fundamentals
- Understanding of coding best practices and version control (Git)
- Basic debugging and problem‑solving skills
- Exposure to unit testing concepts is a plus
Nice to Have (Preferred Skills)
- Exposure to big data tools (Hive, Hadoop ecosystem, or similar)
- Familiarity with cloud platforms (AWS / Azure / GCP)
- Basic knowledge of job orchestration tools (Airflow, etc.)
- Understanding of data pipelines and workflow lifecycle
- Academic or project experience with data engineering or analytics
Ideal Candidate Traits
- Strong willingness to learn and grow in a fast‑paced environment
- Good analytical and problem‑solving skills
- Effective communication and teamwork abilities
- Attention to detail and commitment to quality
------------------------------------------------------ ## Job Family Group:
Technology
------------------------------------------------------ ## Job Family:
Applications Development
------------------------------------------------------ ## Time Type:
Full time
------------------------------------------------------ ## Most Relevant Skills
Please see the requirements listed above.
------------------------------------------------------ ## Other Relevant Skills
For complementary skills, please see above and/or contact the recruiter.
------------------------------------------------------ Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.
Benefits and perks
•Learning Budget
Required skills
Python
PySpark
ETL
Data pipelines
Data validation
About Citigroup
Chennai
Headquarters