採用
ABOUT BASETEN
Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. We're growing quickly and recently raised our $300M Series E https://www.baseten.co/blog/announcing-baseten-s-300m-series-e/, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products.
THE ROLE:
We're building a Customer Engineering team to own the post-sales technical relationship with our most strategic and enterprise customers. As a Sr. Customer Engineer, you'll be the technical front door for accounts running production ML workloads on Baseten — the person customers trust to keep their models healthy, their incidents short, and their roadmap heard.
This role blends deep infrastructure debugging, AI/ML performance expertise, incident command, and proactive account ownership. You'll triage and resolve issues across Kubernetes, GPUs, networking, and model serving, lead war rooms during P0 escalations, and translate recurring pain points into product improvements. You're not just reactive — you'll monitor customer health, drive QBRs, set up proactive alerting, and identify expansion opportunities before customers have to ask.
You'll partner closely with Solutions Architecture, SRE, Infra, Product, and Forward Deployed Engineering, but you own the customer outcome end-to-end: from first response to root-cause analysis to the follow-up that reinforces trust.
RESPONSIBILITIES:
Technical Support & Debugging:
-
Serve as the first responder to all post-sales customer issues via ticketing (Pylon) and Slack, triaging and resolving Tier 1 and Tier 2 issues independently.
-
Diagnose runtime issues related to latency, memory behavior, GPU utilization, concurrency, and model lifecycle management.
-
Debug infrastructure problems across Kubernetes (pods, controllers), networking, observability, and alerting systems.
-
Pull logs, read error traces, and correlate signals across Grafana, Loki, and Prometheus to pinpoint root causes — even when the real issue is buried layers deep.
Incident Response & Escalation:
-
Lead incident response during outages and escalations, coordinating across Product, SRE, Sales, and Engineering.
-
Own customer communication through resolution — even when the fix is handed off to SRE or Infra — including delivering root-cause analyses after every P0/P1.
-
Escalate to SRE/ other engineering teams with structured context (customer, affected models, what you've already ruled out, specific ask) so nothing gets lost in translation.
-
Drive post-incident alerting reviews: why did the customer find this before we did, and what instrumentation or process change prevents it next time?
Proactive Account Ownership:
-
Serve as the technical owner for top enterprise accounts with strict SLAs and high responsiveness expectations.
-
Set up and maintain proactive monitoring and alerts for all customer production models within 24 hours of handoff from SA(Solution Architect).
-
Drive the QBR process and proactive reengagement for expansion opportunities.
-
Track recurring failure patterns across accounts and push for durable fixes — not just incident closure.
-
Monitor internal feedback channels and route product-level issues to the right teams.
Cross-Functional Collaboration
-
Own the SA-to-CE handoff for new customers: validate architecture, confirm production-readiness milestones, and establish escalation paths.
-
Maintain and improve runbooks, knowledge bases, and diagnostic best practices so the team scales with the customer base.
-
Translate user feedback into roadmap signals, documentation improvements, and product enhancements.
-
Coordinate end-to-end on projects spanning feature requests, new deployments, and operational debugging — scoping, execution, communication, and stakeholder alignment.
REQUIREMENTS:
-
Deep Kubernetes troubleshooting expertise, including resource debugging, pod/runtime analysis, and log-based diagnostics with observability tooling (Grafana, Loki, Prometheus).
-
Strong infrastructure debugging across container orchestration, networking, and service dependencies, with hands-on production cluster experience.
-
Experience managing high-severity incidents with major customers — SLAs, war rooms, post-incident reviews, and clear executive-level communication throughout.
-
Proven project management skills with an ownership mindset: you can run multiple complex, multi-stakeholder initiatives in parallel without dropping threads.
-
Ability to translate recurring technical pain points into roadmap-level insights and product improvements.
-
Strong communication skills and executive presence during high-visibility situations, ensuring both technical clarity and customer confidence.
-
3+ years of experience in a fast-paced, high-growth, or customer-facing engineering environment.
NICE TO HAVE:
-
Familiarity with high-performance AI model serving, including troubleshooting ML pipelines from preprocessing through inference.
-
Experience with ticketing and incident-response platforms such as Pylon or Zendesk.
-
Hands-on experience with Helm, Flux, CI/CD tooling, or scripting automations for deployment and operational workflows.
-
Background in SRE, DevOps, or forward-deployed engineering roles at an infrastructure company.
BENEFITS:
-
Competitive compensation, including meaningful equity.
-
100% coverage of medical, dental, and vision insurance for employee and dependents
-
Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
-
Paid parental leave
-
Company-facilitated 401(k)
-
Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.
At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.
We are an Equal Opportunity Employer and will consider qualified applicants with criminal histories in a manner consistent with applicable law (by example, the requirements of the San Francisco Fair Chance Ordinance, where applicable).
総閲覧数
0
応募クリック数
0
模擬応募者数
0
スクラップ
0
類似の求人

AVP, customer success (West)
Writer · San Francisco, CA

Customer Experience Concierge, Centurion Lounge - SFO
American Express · San Francisco, California, United States

Lab Concierge
Thermo Fisher · San Francisco, California, USA

Leader, Product Support
Benchling · San Francisco, CA

Customer Support Learning & Enablement Specialist
Mercury · San Francisco, CA, New York, NY, Portland, OR, or Remote within Canada or United States
Basetenについて

Baseten
Series CBaseten provides a platform for deploying and scaling machine learning models in production environments. The company offers infrastructure and tools for ML engineers to build, deploy, and monitor AI applications.
51-200
従業員数
San Francisco
本社所在地
$1.0B
企業価値
レビュー
4.1
10件のレビュー
ワークライフバランス
4.2
報酬
2.8
企業文化
4.3
キャリア
3.5
経営陣
3.2
72%
友人に勧める
良い点
Flexible work arrangements and schedules
Supportive team environment and good colleagues
Good benefits and health coverage
改善点
Below industry standard compensation and salary
Limited career advancement opportunities
High workload and stressful expectations
給与レンジ
9件のデータ
Junior/L3
L2
L3
L4
L5
L6
Recruiter
Junior/L3 · Recruiter
0件のレポート
$183,600
年収総額
基本給
-
ストック
-
ボーナス
-
$156,060
$211,140
面接体験
52件の面接
難易度
3.3
/ 5
期間
14-28週間
内定率
42%
体験
ポジティブ 66%
普通 21%
ネガティブ 13%
面接プロセス
1
Phone Screen
2
Technical Interview
3
Hiring Manager
4
Team Fit
よくある質問
Technical skills
Past experience
Team collaboration
Problem solving
ニュース&話題
Baseten Introduces Delivery Network Aimed at Faster Large-Model Inference - TipRanks
TipRanks
News
·
4w ago
Baseten Technologies - 2026 Funding Rounds & List of Investors - Tracxn
Tracxn
News
·
4w ago
Strat AE l Baseten vs Cursor?
I have an opportunity to go to Baseten or Cursor as a strat rep. I'm torn on the meteoric rose of Cursor vs getting into the inference game given its size & CAGR. What would you do? Anyone have experience with either?
·
5w ago
·
6
·
21
Inferless Joins Baseten
HN
·
8w ago
·
1