招聘
必备技能
AWS
Docker
Kubernetes
GCP
Azure
Spark
Airflow
As a global leader in cybersecurity, Crowd Strike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn’t changed — we’re here to stop breaches, and we’ve redefined modern security with the world’s most advanced AI-native platform. We work on large scale distributed systems, processing almost 3 trillion events per day and this traffic is growing daily. Our customers span all industries, and they count on Crowd Strike to keep their businesses running, their communities safe and their lives moving forward. We’re also a mission-driven company. We cultivate a culture that gives every Crowd Striker both the flexibility and autonomy to own their careers. We’re always looking to add talented Crowd Strikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you.
About the Role:
- We're seeking a Sr. Engineer
- ML Platform (Infrastructure & Debugging Specialist) to maintain and optimize Crowd Strike's mission-critical ML infrastructure. You'll diagnose complex distributed systems issues and ensure platform reliability for infrastructure processing billions of events daily.
What You'll Do:
Platform Reliability & Debugging: Diagnose and resolve issues across Ray, Spark, Airflow, MLflow, Jupyter Hub, Kubeflow, and SLURM Perform root cause analysis on production incidents affecting training and inference pipelines Debug performance bottlenecks, resource contention, memory leaks, and scheduling conflicts Develop debugging tools and diagnostic frameworks
System Optimization & Performance: Profile and optimize Ray clusters and Spark jobs on K8s and Cloud (EMR/Dataproc) Troubleshoot Jupyter Hub spawner issues, kernel crashes, and resource allocation Optimize SLURM job scheduling, GPU allocation, and HPC cluster utilization
Infrastructure & Monitoring: Build observability solutions and automated health checks Develop runbooks, alerting workflows, and incident response procedures Maintain platform stability metrics (SLAs, error rates, latency)
Collaboration: Partner with ML and ML Platform engineers to resolve workflow issues Conduct post-mortems and mentor on debugging techniques
What You'll Need:
-
12+ years in distributed systems engineering
-
5+ years debugging ML platforms in production
-
Deep expertise in 3+ one of: Ray, Spark, Jupyter Hub, SLURM, K8 Performance profiling, optimization, and capacity planning
Technical Skills (Expertise in at least one):
-
Distributed ML: Ray, Spark, SLURM, Jupyter Ecosystem (debugging failures, performance tuning)
-
ML Platforms: Airflow, MLflow, Jupyter Hub (troubleshooting core components) Infrastructure: Kubernetes, Docker, AWS/GCP/Azure/OCI
-
Observability: Profiling tools, distributed tracing, Prometheus, Grafana, log aggregation
-
Programming: Expert Python debugging, multi-language proficiency, Linux/Unix
What Sets You Apart: Open-source ML infrastructure contributions Experience with high-throughput inference systems and reducing MTTR Published debugging guides or tools Chaos engineering and GPU/CUDA debugging experience On-call and incident management experience
Benefits of Working at Crowd Strike:
-
Market leader in compensation and equity awards
-
Comprehensive physical and mental wellness programs
-
Competitive vacation and holidays for recharge
-
Paid parental and adoption leaves
-
Professional development opportunities for all employees regardless of level or role
-
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
-
Vibrant office culture with world class amenities
-
Great Place to Work Certified™ across the globe
Crowd Strike is proud to be an equal opportunity employer. We are committed to fostering a culture of belonging where everyone is valued for who they are and empowered to succeed. We support veterans and individuals with disabilities through our affirmative action program.
Crowd Strike is committed to providing equal employment opportunity for all employees and applicants for employment. The Company does not discriminate in employment opportunities or practices on the basis of race, color, creed, ethnicity, religion, sex (including pregnancy or pregnancy-related medical conditions), sexual orientation, gender identity, marital or family status, veteran status, age, national origin, ancestry, physical disability (including HIV and AIDS), mental disability, medical condition, genetic information, membership or activity in a local human rights commission, status with regard to public assistance, or any other characteristic protected by law. We base all employment decisions--including recruitment, selection, training, compensation, benefits, discipline, promotions, transfers, lay-offs, return from lay-off, terminations and social/recreational programs--on valid job requirements.
If you need assistance accessing or reviewing the information on this website or need help submitting an application for employment or requesting an accommodation, please contact us at recruiting@crowdstrike.com for further assistance.
总浏览量
0
申请点击数
0
模拟申请者数
0
收藏
0
相似职位

ServiceNow Senior Developer
Johnson Controls · Pune-Maharashtra-India

Principal Engineer
ThoughtSpot · India - Bangalore

Principal Software Engineer
Microsoft · United States, Washington, Redmond

Senior Technical Architect
Salesforce · India - Bangalore

Senior Software Development Engineer
Expedia Group · India - Bangalore
关于CrowdStrike

CrowdStrike
PublicCrowdStrike Holdings, Inc. is an American cybersecurity technology company based in Austin, Texas. It provides endpoint security, threat intelligence, and cyberattack response services.
5,001-10,000
员工数
Austin
总部位置
$50B
企业估值
评价
3.9
10条评价
工作生活平衡
2.8
薪酬
4.2
企业文化
4.1
职业发展
3.8
管理层
2.5
72%
推荐给朋友
优点
Great team culture and collaborative environment
Innovative and cutting-edge technology
Excellent compensation and benefits
缺点
Fast-paced environment can be stressful/overwhelming
Work-life balance challenges
Management lacks direction and listening skills
薪资范围
9个数据点
L2
L3
L4
L5
L6
L2 · Financial Analyst L2
0份报告
$102,887
年薪总额
基本工资
$41,155
股票
$51,444
奖金
$10,289
$72,021
$133,753
面试经验
8次面试
难度
3.3
/ 5
时长
14-28周
录用率
25%
体验
正面 13%
中性 50%
负面 37%
面试流程
1
Application Review
2
Recruiter Screen
3
Online Assessment
4
Technical Interview
5
Hiring Manager Interview
6
Offer
常见问题
Coding/Algorithm
Technical Knowledge
Behavioral/STAR
System Design
Past Experience
新闻动态
Assessing CrowdStrike Holdings (CRWD) Valuation After Recent Share Gains And Strong Multi Year Returns - simplywall.st
simplywall.st
News
·
2d ago
Reassessing CrowdStrike (CRWD) After Recent Share Price Swings And Rich Revenue Multiple - simplywall.st
simplywall.st
News
·
3d ago
CrowdStrike’s Next Act: Securing The Era Of Enterprise Agentic AI - Forbes
Forbes
News
·
4d ago
CrowdStrike’s Anthropic AI Role And Gartner Nod Tested Against Valuation - Yahoo Finance
Yahoo Finance
News
·
5d ago