
Specializing in fast fashion.
Staff Site Reliability Engineer
必須スキル
Kubernetes
Linux
Redis
Kafka
About SHEIN
SHEIN is a global online fashion and lifestyle retailer, offering SHEIN branded apparel and products from a global network of vendors, all at affordable prices. Headquartered in Singapore, with more than 15,000 employees operating from offices around the world, SHEIN is committed to making the beauty of fashion accessible to all, promoting its industry-leading, on-demand production methodology, for a smarter, future-ready industry.
Position Summary
We are seeking a Staff Site Reliability Engineer (Official Title: Staff Site Reliability Engineer I) with deep experience operating and evolving large-scale, mission-critical systems where availability and reliability are non-negotiable. At SHEIN, Site Reliability Engineers are hybrid software and systems engineers responsible for keeping production services always on while enabling the platform to scale rapidly and safely. In this role, you will own and support complex services and infrastructure, ensuring they consistently meet reliability and performance expectations. At the Staff level, you will also provide technical leadership, influencing platform architecture, reliability strategy, and operational standards across the organization. The SRE team owns and maintains critical open-source and in-house technologies that underpin the platform and serves as a core contributor to major engineering initiatives. We are accountable for driving platform operability forward by reducing incident frequency, minimizing MTTR, and improving system resilience, efficiency, and resource utilization.
You will work closely with global, cross-functional teams to design, build, and evolve observability and operational tooling—including metrics, logs, traces, alerting, and automation—providing deep visibility into system behavior. Through hands-on engineering and operational excellence, you will proactively identify risks and failure modes, help prevent incidents before they occur, and lead fast, effective responses when they do. To succeed in this role, you will combine strong software engineering skills, solid to deep expertise in Linux, networking, and distributed systems, and a passion for solving problems of scale, complexity, and reliability. Your work will directly contribute to delivering a stable, scalable, and high-performing experience for customers worldwide.
Job Responsibilities
-
Keep SHEIN’s mission-critical production systems running 24/7/365, participating in on-call rotations and acting decisively during incidents.
-
Triage and resolve production incidents, driving root cause analysis and contributing to continuous improvements that reduce MTTR and prevent recurrence.
-
Monitor and manage capacity planning and resource utilization, partnering with cross-functional teams to ensure systems scale safely while remaining cost-effective.
-
Own and operate core open-source infrastructure such as APISIX, Nginx, Kubernetes, Kafka, Elasticsearch, Redis, Consul, Etcd, Zookeeper and other large-scale distributed systems.
-
Design, build, and maintain observability solutions (metrics, logs, traces, alerting) to improve system visibility, reliability, and resiliency.
-
Automate operational workflows and eliminate manual toil through scripting, tooling, and process improvements.
-
Develop and maintain technical documentation, including runbooks, architecture diagrams, operational procedures, and on-call playbooks.
-
Work closely with global engineering teams to improve infrastructure reliability and performance through better system design and operational discipline.
-
Mentor Senior and mid-level SREs, raising the overall technical bar and operational maturity of the team.
-
Lead efforts to modernize the platform in alignment with industry best practices and evolving technology standards.
Job Requirements
-
Bachelor’s degree in Computer Science, Information Systems, or a related technical discipline, or equivalent practical experience.
-
6+ years of experience owning and operating large-scale, high-traffic, 24/7 production systems, ideally in cloud or cloud-native environments.
-
Solid foundations in Linux, networking, and distributed systems, with the ability to debug complex production issues end to end.
-
Hands-on experience with incident response, troubleshooting, and performance optimization in distributed systems.
-
Strong software engineering skills with experience building automation, tooling, or platforms in languages such as Python or Go.
-
Experience operating or supporting open-source infrastructure components such as APISIX, Nginx, Kubernetes, Kafka, Elasticsearch, Redis, Consul, Etcd, Zookeeper, etc.
-
Experience with observability and monitoring systems (Prometheus, Grafana, Zabbix, etc.) and performance analysis.
-
Familiarity with Git, CI/CD pipelines, and configuration management tools (e.g., Ansible).
-
A strong sense of ownership, a systematic approach to problem-solving, and a passion for making systems more reliable.
-
Strong communication skills and the ability to collaborate effectively with geographically distributed teams.
Nice to Have
-
Bilingual fluency in Mandarin and English.
-
Kubernetes Administrator certification or equivalent real-world experience.
-
Experience operating big data platforms (Hadoop, Yarn, HBase, Hive, Spark).
-
Experience applying AI/LLM-powered tools to reliability engineering, including designing and building automation or internal tools using AI-assisted development platforms (e.g., Claude Code).
Benefits and Perks
-
Bonus and RSU eligible
-
Healthcare (medical, dental, vision, prescription drugs)
-
Health Savings Account with Employer Funding
-
Flexible Spending Accounts (Healthcare and Dependent care)
-
Company-Paid Basic Life/AD&D insurance
-
Company-Paid Short-Term and Long-Term Disability
-
Voluntary Benefit Offerings (Voluntary Life/AD&D, Hospital Indemnity, Critical Illness, and Accident)
-
Employee Assistance Program
-
Business Travel Accident Insurance
-
401(k) Savings Plan with discretionary company match and access to a financial advisor
-
Vacation, paid holidays, floating holiday and sick days
-
Employee discounts
-
Free weekly catered lunch
-
Dog-friendly office (available at select locations)
-
Free gym access (available at select locations)
-
Free swag giveaways
-
Annual Holiday Party
-
Invitations to pop-ups and other company events
-
Complimentary daily office snacks and beverages
Pay Range**$108,000—$180,000 USD**
閲覧数
0
応募クリック
0
Mock Apply
0
スクラップ
0
類似の求人

Principal/Sr. Principal Windows Systems Administrator
Northrop Grumman · San Diego, CA

Senior Security DevOps Engineer
Apple · San Diego, CA

Staff Engineer, DevOps (R4666)
Shield AI · San Diego, California

Kubernetes Platform Engineer (IT Engineer Staff)
Qualcomm · San Diego, California, United States of America

Senior Video SRE
Apple · San Diego, CA
SHEINについて

SHEIN
Series F+Shein is a global e-commerce platform specializing in fast fashion. While the company primarily focuses on women's clothing, it also offers men's apparel, children's wear, accessories, cosmetics, shoes, bags, and other fashion items.
10,001+
従業員数
Singapore
本社所在地
$100B
企業価値
レビュー
8件のレビュー
4.3
8件のレビュー
ワークライフバランス
4.2
報酬
4.0
企業文化
4.1
キャリア
3.5
経営陣
3.8
75%
知人への推奨率
良い点
Great work environment and friendly people
Work from home flexibility
Good pay and benefits
改善点
Team layoffs due to overseas hiring
Monotonous daily work
Difficulty during termination periods
給与レンジ
67件のデータ
Mid/L4
Senior/L5
Mid/L4 · Security Engineer II
1件のレポート
$188,500
年収総額
基本給
$145,000
ストック
-
ボーナス
-
$188,500
$188,500
面接レビュー
レビュー1件
難易度
3.0
/ 5
期間
14-28週間
面接プロセス
1
Application Review
2
HR Screen
3
Technical Interview
4
Hiring Manager Interview
5
Offer
よくある質問
Technical Knowledge
AI/ML Concepts
Research Experience
Behavioral/STAR
Past Experience
最新情報
Toddler Boy Stickers DETICKERS 50 PCS Cute Bee Stickers For Kids ... - Shein Waterproof Decals For Water Bottles - aviglianonews.it
aviglianonews.it
News
·
1w ago
Choose, Stylish and Thoughtful Mother’s Day Gifts at Any Budget with SHEIN - KOIN.com
KOIN.com
News
·
1w ago
Temu and Shein Cost Billions in Value Added and Taxes - DIY International
DIY International
News
·
1w ago
🎯 FRIDAY MARKETING INSIGHT: THE SHEIN STORY – HOW AN SEO SPECIALIST TURNED FASHION INTO AN ALGORITHM. PART 1
https://preview.redd.it/y3p9zz2lj3xg1.png?width=1080&format=png&auto=webp&s=59b9d1ed8c366ad086a4a661e8dedb2e069d5cd1 Hello, founders, creators, and business visionaries. Your Chief Purr-ologist from WebSEOMarket here. A question worth sitting with: what do an $89 wedding dress, co-founders vanishing in the dead of night, and a man who doesn't appear to exist have in common? This isn't a thriller. This is the story of how a chap who optimised websites for search engines bu
·
1w ago
·
1