채용

Senior Site Reliability Engineer

SHEIN

San Diego

On-site

Full-time

1mo ago

필수 스킬

Kubernetes

Linux

Redis

Kafka

About SHEIN

SHEIN is a global online fashion and lifestyle retailer, offering SHEIN branded apparel and products from a global network of vendors, all at affordable prices. Headquartered in Singapore, with more than 15,000 employees operating from offices around the world, SHEIN is committed to making the beauty of fashion accessible to all, promoting its industry-leading, on-demand production methodology, for a smarter, future-ready industry.

Position Summary

We are seeking a Senior Site Reliability Engineer (Official Title: Senior Site Reliability Engineer I) with deep experience operating and evolving large-scale, mission-critical systems where availability and reliability are non-negotiable. At SHEIN, Site Reliability Engineers are hybrid software and systems engineers responsible for keeping production services always on while enabling the platform to scale rapidly and safely. In this role, you will own and support complex services and infrastructure, ensuring they consistently meet reliability and performance expectations. The SRE team owns and maintains critical open-source and in-house technologies that underpin the platform and serves as a core contributor to major engineering initiatives. We are accountable for driving platform operability forward by reducing incident frequency, minimizing MTTR, and improving system resilience, efficiency, and resource utilization. You will work closely with global, cross-functional teams to design, build, and evolve observability and operational tooling—including metrics, logs, traces, alerting, and automation—providing deep visibility into system behavior. Through hands-on engineering and operational excellence, you will proactively identify risks and failure modes, help prevent incidents before they occur, and lead fast, effective responses when they do. To succeed in this role, you will combine strong software engineering skills, solid to deep expertise in Linux, networking, and distributed systems, and a passion for solving problems of scale, complexity, and reliability. Your work will directly contribute to delivering a stable, scalable, and high-performing experience for customers worldwide.

Job Responsibilities

Keep SHEIN’s mission-critical production systems running 24/7/365, participating in on-call rotations and acting decisively during incidents.
Triage and resolve production incidents, driving root cause analysis and contributing to continuous improvements that reduce MTTR and prevent recurrence.
Monitor and manage capacity planning and resource utilization, partnering with cross-functional teams to ensure systems scale safely while remaining cost-effective.
Own and operate core open-source infrastructure such as APISIX, Nginx, Kubernetes, Kafka, Elasticsearch, Redis, Consul, Etcd, Zookeeper and other large-scale distributed systems.
Design, build, and maintain observability solutions (metrics, logs, traces, alerting) to improve system visibility, reliability, and resiliency.
Automate operational workflows and eliminate manual toil through scripting, tooling, and process improvements.
Develop and maintain technical documentation, including runbooks, architecture diagrams, operational procedures, and on-call playbooks.
Work closely with global engineering teams to improve infrastructure reliability and performance through better system design and operational discipline.

Job Requirements

Bachelor’s degree in Computer Science, Information Systems, or a related technical discipline, or equivalent practical experience.
3+ years of experience owning and operating large-scale, high-traffic, 24/7 production systems, ideally in cloud or cloud-native environments.
Solid foundations in Linux, networking, and distributed systems, with the ability to debug complex production issues end to end.
Hands-on experience with incident response, troubleshooting, and performance optimization in distributed systems.
Strong software engineering skills with experience building automation, tooling, or platforms in languages such as Python or Go.
Experience operating or supporting open-source infrastructure components such as APISIX, Nginx, Kubernetes, Kafka, Elasticsearch, Redis, Consul, Etcd, Zookeeper, etc.
Experience with observability and monitoring systems (Prometheus, Grafana, Zabbix, etc.) and performance analysis.
Familiarity with Git, CI/CD pipelines, and configuration management tools (e.g., Ansible).
A strong sense of ownership, a systematic approach to problem-solving, and a passion for making systems more reliable.
Strong communication skills and the ability to collaborate effectively with geographically distributed teams.

Nice to Have

Bilingual fluency in Mandarin and English.
Kubernetes Administrator certification or equivalent real-world experience.
Experience operating big data platforms (Hadoop, Yarn, HBase, Hive, Spark).
Experience applying AI/LLM-powered tools to reliability engineering, including designing and building automation or internal tools using AI-assisted development platforms (e.g., Claude Code).

Benefits and Perks

Bonus eligible
Healthcare (medical, dental, vision, prescription drugs)
Health Savings Account with Employer Funding
Flexible Spending Accounts (Healthcare and Dependent care)
Company-Paid Basic Life/AD&D insurance
Company-Paid Short-Term and Long-Term Disability
Voluntary Benefit Offerings (Voluntary Life/AD&D, Hospital Indemnity, Critical Illness, and Accident)
Employee Assistance Program
Business Travel Accident Insurance
401(k) Savings Plan with discretionary company match and access to a financial advisor
Vacation, paid holidays, floating holiday and sick days
Employee discounts
Free weekly catered lunch
Dog-friendly office (available at select locations)
Free gym access (available at select locations)
Free swag giveaways
Annual Holiday Party
Invitations to pop-ups and other company events
Complimentary daily office snacks and beverages

Pay Range**$92,400—$148,800 USD**

총 조회수

총 지원 클릭 수

모의 지원자 수

비슷한 채용공고

Senior Technical Lead - DevOps, Python, Kubernetes

HCL Technologies · San Diego, United States

Senior Security DevOps Engineer

Apple · San Diego, CA

Staff Engineer, DevOps (R4666)

Shield AI · San Diego, California

Senior Video SRE

Apple · San Diego, CA

Kubernetes Platform Engineer (IT Engineer Staff)

Qualcomm · San Diego, California, United States of America

SHEIN 소개

SHEIN

Series F+

Shein is a global e-commerce platform specializing in fast fashion. While the company primarily focuses on women's clothing, it also offers men's apparel, children's wear, accessories, cosmetics, shoes, bags, and other fashion items.

10,001+

직원 수

Singapore

본사 위치

$100B

기업 가치

리뷰

4.3

9개 리뷰

워라밸

4.2

보상

4.5

문화

4.0

커리어

3.5

경영진

3.8

75%

친구에게 추천

장점

Remote work flexibility

Great pay and benefits

Laid back work environment

단점

Team layoffs and overseas hiring

Monotonous daily work

Difficult termination days

연봉 정보

67개 데이터

Mid/L4

Senior/L5

Mid/L4 · Security Engineer II

1개 리포트

$188,500

총 연봉

기본급

$145,000

주식

보너스

$188,500

면접 경험

1개 면접

난이도

3.0

/ 5

소요 기간

14-28주

면접 과정

Application Review

Recruiter Screen

Technical Interview

AI/ML Technical Deep Dive

Team Matching

Offer

자주 나오는 질문

Machine Learning Algorithms

AI Research Experience

Technical Knowledge

Behavioral/STAR

Past Experience

뉴스 & 버즈

Beyond the Trend: Scot Louie and SHEIN Redefine the Spring/Summer ‘26 Wardrobe - The Source Magazine

The Source Magazine

News

1w ago

SHEIN recalls Fidget Spinner Ballpoint Pen with LED Light sold on SHEIN marketplace - The Competition and Consumer Protection Commission

The Competition and Consumer Protection Commission

News

1w ago

'Gossip Girl' Meets Shein in This Stunning, Affordable Collaboration for Under $50 - womansworld.com

womansworld.com

News

1w ago

US court merges overlapping Shein v Temu lawsuits - World IP Review

World IP Review

News

1w ago