Jobs
Benefits & Perks
•Flexible work arrangements
•Professional development budget
•401(k) matching
•Generous paid time off and holidays
•Flexible Hours
•Learning
Required Skills
Node.js
Python
PostgreSQL
The Production Engineering team within the AI and Data Platform (AiDP) organization manages a wide array of real-time, near real-time, and batch analytical solutions. These platforms are integral to core business functions across Apple. These include sales, operations, finance, Apple Care, marketing, and services, and are instrumental in driving critical, data-driven decisions. To build these solutions, we leverage a combination of proprietary and leading open-source technologies such as Kafka, Spark, Iceberg, and Airflow. A key part of our mission is to enable AI-centric automations that enhance the overall efficiency and intelligence of the platform. We are looking for passionate engineers who thrive on solving complex infrastructure challenges at scale, both on-premises and in the cloud. If you are dedicated to optimizing scalable, maintainable, and user-friendly systems, you will find compelling opportunities to make a significant impact at AiDP.
Description
The Service Reliability Engineer (SRE) role within AiDP Production Engineering is a dynamic position that blends strategic architectural design with hands-on technical execution. As an SRE, you will be responsible for configuring, tuning, and ensuring the resilience of complex, multi-tiered systems to achieve optimal application performance, stability, and availability. Our team manages critical data pipelines and applications across both bare-metal and cloud computing platforms, delivering essential data processing for all of Apple's key business functions. We operate at an immense scale, handling exabytes of data, petabytes of memory, and tens of thousands of jobs to enable predictable and performance data analytics that power features and inform decisions across the company. If you are passionate about designing, building, and running data infrastructure that has a direct and significant impact on Apple's global business operations, this is the ideal opportunity for you.","responsibilities":"Ability to understand the application requirements (Performance, Security, Scalability etc.) and assess the right services/topology on AWS, Baremetal & Kubernetes.
Build automation to enable self-healing systems.
Build tools to monitor high performance & alert the low latency applications.
Ability to troubleshoot application specific, core network, system & performance issues.
Involvement in challenging and fast paced projects supporting Apple's business by delivering innovative solutions.
Partner with engineering teams to prioritize and fix production defects.
Take knowledge transition from engineering teams for changes being rolled out in production.
Triage incidents based on the impact, devise and implement mitigation steps to unblock the business.
Conduct RCA, log defects and partner with engineering team for prioritization.
Support java based applications & Spark/Flink jobs on Baremetal, AWS & Kubernetes.
Share on-call rotation with other team members to support apps and services in scope.
Preferred Qualifications
Solid understanding of system design, data structures, and incident management best practices.
Should be able to understand complex architectures and be comfortable working with multiple teams.
Observability tools (e.g: Prometheus, Grafana, CloudWatch).
Ability to conduct performance analysis and troubleshoot large scale distributed systems.
Should be highly proactive with a keen focus on improving uptime/availability of our mission critical services.
Strong expertise in troubleshooting complex production issues.
Excellent problem solving, critical thinking, and communication skills.
Proven ability to resolve incidents, perform root cause analysis, and drive system reliability improvements.
Experience using GenAI or automation tools for issue detection, alerting, or remediation.
Experience in data visualization tools such as Tableau, Business Objects, Thought Spot.
Minimum Qualifications
4+ years experience in cloud-native services, including ETL frameworks like Apache Spark, and Flink.
4+ years experience in messaging systems (Kafka) and cloud infrastructure & services, AWS, GCP, Kubernetes.
4+ years of experience in modern & distributed databases such as Snowflake, Cassandra, Single Store, and SAP HANA.
4+ years of programming experience in Python or Java.
BS/MS in computer science or equivalent experience.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .
Total Views
0
Apply Clicks
0
Mock Applicants
0
Scraps
0
Similar Jobs

Research DevOps Engineer, Software, Center for Quantum Computing
Amazon · Pasadena, CA, USA

Site Reliability Engineer II
Fivetran · Toronto, Ontario, Canada

Site Reliability Engineer II
Microsoft · United States, Washington, Redmond

Cloud Site Reliability Engineer - Cloud and System
TikTok · London, United Kingdom

Site Reliability Engineer - Data Management Suite
TikTok · Singapore
About Apple

Apple
PublicA technology company that designs, manufactures, and markets consumer electronics, personal computers, and software.
10,001+
Employees
Cupertino
Headquarters
$3.5T
Valuation
Reviews
4.0
10 reviews
Work Life Balance
4.0
Compensation
4.2
Culture
3.8
Career
3.5
Management
3.2
75%
Recommend to a Friend
Pros
Great coworkers and people
Excellent benefits and perks
Fast-paced and engaging work environment
Cons
High expectations and pressure
Management quality varies
Limited career progression opportunities
Salary Ranges
17,968 data points
L2
L3
L4
L5
L6
L2 · Cybersecurity Analyst L2
0 reports
$169,000
total / year
Base
$67,600
Stock
$84,500
Bonus
$16,900
$118,300
$219,700
Interview Experience
5 interviews
Difficulty
3.4
/ 5
Duration
28-42 weeks
Offer Rate
20%
Experience
Positive 20%
Neutral 40%
Negative 40%
Interview Process
1
Application Review
2
Recruiter Screen
3
Technical Phone Screen
4
Behavioral Interview
5
Onsite/Virtual Interviews
6
Team Matching
7
Offer
Common Questions
Coding/Algorithm
System Design
Behavioral/STAR
Technical Knowledge
Culture Fit
News & Buzz
Exclusive | First-ever Apple check signed by Steve Jobs sells for a whopping $2.4M at auction - New York Post
Source: New York Post
News
·
4w ago
Apple Stock Forecast: Trending Upgrade After Earnings Beat - TipRanks
Source: TipRanks
News
·
5w ago
Tim Cook Thinks He Has Identified Apple’s Next Big Growth Opportunity - inc.com
Source: inc.com
News
·
5w ago
Apple Gives Itself the Toughest Act to Follow - Bloomberg
Source: Bloomberg
News
·
5w ago