Jobs

Senior Research Engineer, Training Data Infrastructure in Foundation Models
Cupertino, CA
·
On-site
·
Full-time
·
1mo ago
Benefits & Perks
•Parental leave
•Competitive salary and equity package
•Comprehensive health, dental, and vision insurance
•Generous paid time off and holidays
•Professional development budget
•Parental Leave
•Equity
•Healthcare
•Learning
Required Skills
Node.js
PostgreSQL
React
About Us
Working at Apple means doing more than you ever thought possible and having more impact than you ever imagined.
Size: 10000+ employees
Industry: Technology, Information Technology, Software, Consumer Goods & Services
Our team is dedicated to solving the high-quality training data problem at the scale required to train advanced Foundation Models. We believe that the advanced model performance (including reasoning, coding, and agentic planning) fundamentally depends on a data-centric approach to Machine Learning. Our objective is to engineer a large-scale system that acquires, processes, and curates the data required to advance the state of the art in Artificial Intelligence.
We are seeking a Senior Research Engineer who possesses a deep understanding of distributed systems and a strong intuition for Machine Learning. You will join a culture that values engineering craftsmanship, privacy, and rigorous scientific inquiry, utilizing advanced cloud technologies to build the data systems that powers our most capable models.
Description
This position operates at the convergence of Software Engineering and Machine Learning Research. Unlike traditional backend roles, this position requires you to design systems where the outcome is the statistical distribution and quality of data itself. You will work alongside Research Scientists to transform theoretical observations into concrete, scalable engineering solutions. Your core focus will be the architecture of our Data Acquisition, Processing, and Repository Management systems for Large Model training. You will lead technical efforts to enable active, quality-driven data curation, including filtering, deduping, synthetic data generation and data mixing, ensuring our models are trained on the highest-quality information available.","responsibilities":"Architect Scalable Ingestion Systems: Design and implement high-throughput distributed systems to ingest petabytes of text and multimodal data from diverse sources, including web crawls and third-party partnerships.
Email Address
Send me The Muse newsletters for the best in career advice and job search tips.
Get jobs!
Repository Optimization: Manage the lifecycle of large-scale datasets across data storage and high-performance file systems. Optimize data formats for efficient random access and sequential scanning during model training.
Data Governance & Privacy: Engineer robust data governance and privacy solutions for the training data, in collaboration with compliance and legal teams, to ensure adherence to stringent regulatory standards.
High-Performance Processing Pipelines: Build and maintain distributed data processing workflows using advanced frameworks on cloud infrastructure (e.g., GCP, AWS).Algorithmic Data Curation: Implement sophisticated data filtering and selection logic to remove low-quality content. Develop semantic deduplication at scale to prevent model memorization and improve training efficiency.
Decontamination Removal: Design automated systems to detect and remove benchmark leakage, ensuring that evaluation datasets remain strictly isolated from training corpora.
Infrastructure for Scaling Laws: Collaborate with researchers to enable data ablations and scaling experiments. Build tools to support systematic data mixture optimization and empirically data studies.
Preferred Qualifications:
Research Collaboration: Experience working within or closely with ML research organizations (e.g., as a Research Engineer), with an ability to translate research results into engineering implementations.
Domain Knowledge: Familiarity with lifecycle of modern LLM training, end-to-end workflows, and underlying system architecture.
Complex Data Types: Experience in processing complex data modalities beyond plain text, such as source code repositories, images, videos, and audios.
Minimum Qualifications
Education: Bachelor's degree in Computer Science, Electrical Engineering, or Mathematics.
Technical Expertise: 4+ years of software engineering experience with a specific focus on Data Infrastructure, Distributed Systems, or AI/ML Engineering.
Language Proficiency: Expert fluency in Python, and strong competence in system languages such as C++.
Cloud Architecture: Extensive experience architecting solutions on major public cloud platforms (e.g. GCP) to build scalable data systems (e.g. with Apache Beam, GCS)
Performance Engineering: Deep experience profiling and optimizing high-throughput data systems. Demonstrated ability to debug distributed bottlenecks (e.g., stragglers, I/O saturation), optimize data formats and provide efficient data storage solutions.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .
Client-provided location(s): Cupertino, CA
Job ID: apple-200641234-0836_rxr-660
Employment Type: OTHER
Posted: 2026-01-26T19:11:59
Apply on company site
Perks and Benefits
Health and Wellness
Parental Benefits
Work Flexibility
Office Life and Perks
Vacation and Time Off
Financial and Retirement
Professional Development
Diversity and Inclusion
Company Videos
Hear directly from employees about what it is like to work at Apple.
Apply on company site
Similar Jobs
Suggested Searches
Search Additional Jobs
Senior Research Engineer Jobs in Cupertino, CAJobs in Cupertino, CA
Total Views
0
Apply Clicks
0
Mock Applicants
0
Scraps
0
Similar Jobs

Senior Infrastructure Engineer (Software-Focused)
Asana · Warsaw

Research Engineer, Pre-training
Anthropic · Remote-Friendly (Travel-Required)

Principal Cloud Engineer
Cadence · KATO SCHOLARI 01
Community ML Research Engineer, non-AI scientific fields - EMEA Remote
Hugging Face · Paris

Senior Software Engineer - Money, Infrastructure
Databricks · Bengaluru, India
About Apple

Apple
PublicA technology company that designs, manufactures, and markets consumer electronics, personal computers, and software.
10,001+
Employees
Cupertino
Headquarters
$3.5T
Valuation
Reviews
4.0
10 reviews
Work Life Balance
4.0
Compensation
4.2
Culture
3.8
Career
3.5
Management
3.2
75%
Recommend to a Friend
Pros
Great coworkers and people
Excellent benefits and perks
Fast-paced and engaging work environment
Cons
High expectations and pressure
Management quality varies
Limited career progression opportunities
Salary Ranges
17,968 data points
L2
L3
L4
L5
L6
L2 · Business Analyst L2
0 reports
$114,215
total / year
Base
$45,686
Stock
$57,108
Bonus
$11,422
$79,951
$148,480
Interview Experience
5 interviews
Difficulty
3.4
/ 5
Duration
28-42 weeks
Offer Rate
20%
Experience
Positive 20%
Neutral 40%
Negative 40%
Interview Process
1
Application Review
2
Recruiter Screen
3
Technical Phone Screen
4
Behavioral Interview
5
Onsite/Virtual Interviews
6
Team Matching
7
Offer
Common Questions
Coding/Algorithm
System Design
Behavioral/STAR
Technical Knowledge
Culture Fit
News & Buzz
Exclusive | First-ever Apple check signed by Steve Jobs sells for a whopping $2.4M at auction - New York Post
Source: New York Post
News
·
4w ago
Apple Stock Forecast: Trending Upgrade After Earnings Beat - TipRanks
Source: TipRanks
News
·
5w ago
Tim Cook Thinks He Has Identified Apple’s Next Big Growth Opportunity - inc.com
Source: inc.com
News
·
5w ago
Apple Gives Itself the Toughest Act to Follow - Bloomberg
Source: Bloomberg
News
·
5w ago