採用

ML Safety Research Engineer

Apple

San Francisco, CA

On-site

Full-time

1mo ago

Benefits & Perks

•Remote work flexibility

•Parental leave program

•Wellness benefits

•Flexible PTO policy

•Health, dental, and vision coverage

Required Skills

Apache Spark

PyTorch

Airflow

About the Role

Apple Services Engineering (ASE) powers many AI features across App Store, Music, Video and more. We build deeply personal products with the goal of representing users around the globe authentically. We work continuously to avoid perpetuating systemic biases and maintain safe and trustworthy experiences across our AI tools and models.

Our team, part of Apple Services Engineering, is looking for an ML Research Engineer to lead the design and continuous development of automated safety benchmarking methodologies. In this role, you will investigate how media-related agents behave, develop rigorous evaluation frameworks and techniques, and establish scientific standards for assessing risks they pose and safety performance. This role supports the development of scalable evaluation techniques that ensure our engineers have the right tools to assess candidate models and product features for responsible and safe performance.

The capabilities you build will allow for the generation of benchmark datasets and evaluation methodologies for model and application outputs, at scale, to enable engineering teams to translate safety insights into actionable engineering and product improvements. This role blends deep technical expertise with strong analytical judgment to develop tools and capabilities for assessing and improving the behavior of advanced AI/ML models. You will work cross-functionally with Engineering and Project Managers, Product, and Governance teams to develop a suite of technologies to ensure that AI experiences are reliable, safe, and aligned with human expectations.

The successful candidate will take a proactive approach to working independently and collaboratively on a wide range of projects. In this role, you will work alongside a small but impactful team, collaborating with ML and data scientists, software developers, project managers, and other teams at Apple to understand requirements and translate them into scalable, reliable, and efficient evaluation frameworks.

Responsibilities

Design scientifically-grounded benchmarking methodologies covering multiple dimensions of responsibility and safety across several media and application marketplace use cases
Develop automated evaluation pipelines that collect, automatically judge, and analyze model outputs with respect to safety policies, at scale
Create and curate datasets, tasks, and feature usage scenarios that represent realistic and adversarial use cases across multiple languages, markets, and domains
Define and validate new metrics for complex phenomena such as multi-turn agentic interaction patterns
Apply statistical rigor and reproducibility to above mentioned objectives
Work closely with engineering and research teams to translate experimental findings into actionable model improvements and safety mitigations
Publish internal reports and external papers
Monitor evolving industry practices and academic work to ensure benchmarks remain relevant

Minimum Qualifications

Advanced degree (MS or PhD) in Computer Science, Software Engineering, or equivalent research/work experience
1+ years of work experience either as a postdoc or in the industry
Strong research background in empirical evaluation, experimental design, or benchmarking
Strong proficiency in Python (pandas, Num Py, Jupyter, Py Torch, etc.)
Deep familiarity with software engineering workflows and developer tools
Experience working with or evaluating AI/ML models, preferably LLMs or program synthesis systems
Strong analytical and communication skills, including the ability to write clear reports

Technical Skills

Proficiency in Python (pandas, Num Py, Jupyter, Py Torch, etc.)
Experience working with large datasets, annotation tools, and model evaluation pipelines
Familiarity with evaluations specific to responsible AI and safety, hallucination detection, and/or model alignment concerns
Ability to design taxonomies, categorization schemes, and structured labeling frameworks
Ability to interpret unstructured data (text, transcripts, user sessions) and derive meaningful insights
Strong ability to stitch together qualitative and quantitative insights into actionable guidance
Strong ability to communicate complex architectures and systems to a variety of stakeholders

Preferred Qualifications

Publications in AI/ML evaluation or related fields
Experience with automated testing frameworks
Experience constructing human-in-the-loop or multi-turn evaluation setups
Intermediate or Advanced Proficiency in Swift
Familiarity with RAG systems, reinforcement learning, agentic architectures, and model fine-tuning
Expertise in designing annotation guidelines and validation instruments and techniques
Background in human factors, social science, and/or safety assessment methodologies
Education in Data Science, Linguistics, Cognitive Science, HCI, Psychology, Social Science, or a related field

Equal Opportunity

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.

Total Views

Apply Clicks

Mock Applicants

Scraps

Similar Jobs

Forward Deployed SRE

Baseten · San Francisco

Senior Director, Procurement and Third Party Risk

Chime · San Francisco, CA

Platform Engineer, Forward Deployed Engineering

OpenAI · San Francisco

Head of Forward Deployed Engineering

Postman · San Francisco, California, United States

Manager, AI Deployment Engineering

OpenAI · San Francisco

About Apple

Apple

Public

A technology company that designs, manufactures, and markets consumer electronics, personal computers, and software.

10,001+

Employees

Cupertino

Headquarters

$3.5T

Valuation

Reviews

4.0

10 reviews

Work Life Balance

4.0

Compensation

4.2

Culture

3.8

Career

3.5

Management

3.2

75%

Recommend to a Friend

Pros

Great coworkers and people

Excellent benefits and perks

Fast-paced and engaging work environment

Cons

High expectations and pressure

Management quality varies

Limited career progression opportunities

Salary Ranges

17,968 data points

L2 · Cybersecurity Analyst L2

0 reports

$169,000

total / year

Base

$67,600

Stock

$84,500

Bonus

$16,900

$118,300

$219,700

Interview Experience

5 interviews

Difficulty

3.4

/ 5

Duration

28-42 weeks

Offer Rate

20%

Experience

Positive 20%

Neutral 40%

Negative 40%

Interview Process

Application Review

Recruiter Screen

Technical Phone Screen

Behavioral Interview

Onsite/Virtual Interviews

Team Matching

Offer

Common Questions

Coding/Algorithm

System Design

Behavioral/STAR

Technical Knowledge

Culture Fit

News & Buzz

Exclusive | First-ever Apple check signed by Steve Jobs sells for a whopping $2.4M at auction - New York Post

Source: New York Post

News

4w ago

Apple Stock Forecast: Trending Upgrade After Earnings Beat - TipRanks

Source: TipRanks

News

5w ago

Tim Cook Thinks He Has Identified Apple’s Next Big Growth Opportunity - inc.com

Source: inc.com

News

5w ago

Apple Gives Itself the Toughest Act to Follow - Bloomberg

Source: Bloomberg

News

5w ago