채용
필수 스킬
Kubernetes
Together AI is building the AI Inference & Model Shaping Platform that brings the most advanced generative AI models to the world. Our platform powers multi-tenant server-less workloads and dedicated endpoints, enabling developers, enterprises, and researchers to harness the latest LLMs, multimodal models, image, audio, video, and reasoning models at scale.
We are looking for an exceptional Engineering Lead to partner closely with our cross-functional engineering, infrastructure, research, and sales teams to ensure excellence of our ML API offerings. Your primary focus will be on delivering world-class inference and fine-tuning in our public APIs and customer deployments by building automation and operations processes.
This role is ideal for a highly motivated and technically adept individual who excels in fast-paced, dynamic environments. You will be in charge of designing and scaling our ML processes & tooling at production scale – optimizing operations to ensure availability and reliability for our services, across differing tenants and user loads, and in a multi-cluster deployment. You will serve as a passionate advocate for internal and external customers, providing feedback to the wider engineering and infrastructure teams to improve our systems and core business metrics. If you thrive in a collaborative, problem-solving environment and are driven to deliver operational excellence, we encourage you to apply for this exciting opportunity.
Key Responsibilities
-
Own availability and performance SLAs for production inference and fine-tuning services across serverless and dedicated deployments
-
Own & improve testing, deployment, configuration management, and monitoring practices for multi-cluster ML infrastructure – partnering closely with Infra SREs
-
Build self-serve tooling and automation to reduce operational toil and enable self-serve offerings.
-
Define and enforce configuration best practices for inference engines (SGLang, TRT-LLM, vLLM etc.) to prevent runtime issues
-
Lead incident response, conduct postmortems, and drive reliability improvements
-
Mentor team members and potentially grow into hiring/team building as the organization scales
-
Partner with infrastructure and ML engineering teams to improve system reliability and cost efficiency
Required Qualifications
-
5+ years operating production ML inference or training systems at scale
-
2+ years in senior IC or tech lead roles, with demonstrated mentorship and technical leadership experience. Having built or scaled teams is a plus.
-
Deep expertise with Kubernetes, multi-cluster orchestration, and ML serving frameworks
-
Experience with multi-tenant SaaS platforms
-
Proven track record of SLA ownership with specific metrics (99.9% uptime, p99 latency targets)
-
Customer escalation and incident communication experience
-
Experience with LLM inference serving systems (SGLang, vLLM, TRT-LLM, or similar)
-
Ability to influence cross-functional teams and make deployment/architecture decisions
Nice to Have
-
Experience building internal developer platforms or self-serve tooling
-
Background in cost optimization for GPU infrastructure
-
Contributions to open-source ML infrastructure projects
About Together AI
Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as Flash Attention, Hyena, Flex Gen, and Red Pajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.
Compensation
We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $250,000 - $300,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.
Equal Opportunity
Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.
Please see our privacy policy at https://www.together.ai/privacy
총 조회수
0
총 지원 클릭 수
0
모의 지원자 수
0
스크랩
0
비슷한 채용공고

Engineering Manager II, Programmatic Offsite Ads
Pinterest · San Francisco, CA, US; Palo Alto, CA, US

Engineering Manager, Mobile
Whatnot · San Francisco, CA

Engineering Manager, Product Monetization (Billing Platform)
Anthropic · San Francisco, CA

Applications of ML Engineering Manager
Apple · San Francisco, CA

Engineering Manager - Consumer
Plaid · San Francisco
Together AI 소개

Together AI
Series BData annotation company.
51-200
직원 수
San Francisco
본사 위치
$1.25B
기업 가치
리뷰
3.8
10개 리뷰
워라밸
3.5
보상
2.8
문화
4.2
커리어
3.0
경영진
3.2
65%
친구에게 추천
장점
Great team culture and collaboration
Flexible work arrangements and remote options
Good work-life balance
단점
Below industry standard compensation
High workload and overwhelming demands
Limited career advancement opportunities
연봉 정보
0개 데이터
Mid/L4
Senior
Mid/L4 · Product Designer
0개 리포트
$156,800
총 연봉
기본급
$156,800
주식
-
보너스
-
$133,280
$180,320
면접 경험
3개 면접
난이도
3.0
/ 5
소요 기간
14-28주
면접 과정
1
Application Review
2
Recruiter Screen
3
Technical Phone Screen
4
Coding Rounds
5
System Design Interview
6
Final Interview
자주 나오는 질문
Coding/Algorithm
System Design
Technical Knowledge
Behavioral/STAR
Infrastructure/SRE
뉴스 & 버즈
Amazon launches AI Store, showcasing range of AI-powered consumer devices - connectedtoindia.com
connectedtoindia.com
News
·
1w ago
Together AI - Forbes
Forbes
News
·
1w ago
Together AI lands massive new headquarters in San Francisco while 'in this hypergrowth phase' - The Business Journals
The Business Journals
News
·
1w ago
Annual TraceGains ‘Together’ Conference to Showcase AI, Connected Data in Food and Beverage Industry - Quality Assurance & Food Safety
Quality Assurance & Food Safety
News
·
1w ago