
ML Infrastructure Service Reliability Engineer- Apple Services Engineering
About the role
At Apple, we don’t just build products — we create transformative experiences
that have reshaped entire industries. Our innovation is driven by the diversity of
our people and their ideas, inspiring everything we do. Imagine the impact you
could make. Join Apple and help us leave the world better than we found it.
The ML Infrastructure team is responsible for managing Apple’s largest ML
compute platform, multi-cloud storage abstraction and caching platform, which
supports critical machine learning training workloads that power user-facing
features across the Apple ecosystem. Operating across both first-party and
third-party cloud environments brings complex and unique challenges.
As a Site Reliability Engineer (SRE) on the ML Infrastructure team, you’ll be
expected to address these challenges through a strong foundation in cloud
object storage, data analysis, automation, collaboration, and advanced
expertise in Kubernetes. Our team oversees the full infrastructure stack — from
low-level nodes to the complete network architecture — ensuring our platform
remains highly available, resilient, and efficient at scale.
Required skills
Site Reliability Engineering
Machine Learning Infrastructure
Cloud Computing
Automation
Distributed Systems
About Apple
Bengaluru
Headquarters