热门公司

JPMorgan Chase
JPMorgan Chase

Global financial services firm

Sr Director of Software Engineering- AI Infrastructure Platform

职能工程
级别总监级
地点Palo Alto, CA; San Francisco, CA; Seattle
方式现场办公
类型全职
发布1周前
立即申请

Your opportunity to make a real impact and shape the future of financial services is waiting for you. Let's push the boundaries of what's possible together.

As a Senior Director of Software Engineering at JPMorgan Chase within the firmwide AI Infrastructure Platform organization, you will lead multiple technical areas and manage the activities of multiple departments responsible for delivering a unified AI infrastructure layer across on‑premises environments, public cloud, and emerging accelerated‑compute vendors. You will collaborate across AI/ML engineering, infrastructure, security and controls, and vendor teams to ensure the firm remains at the forefront of AI platform capabilities, operational excellence, and industry best practices.

In this role, you will own training and experimentation on a Kubernetes‑standardized platform. While a dedicated architecture function exists, you will act as an active design partner-guiding architectural trade‑offs and ensuring designs translate into reliable, secure, and operable systems at enterprise scale.

Job responsibilities

  • Lead multiple technology and platform implementations across departments to deliver firmwide AI infrastructure objectives, with a primary focus on training and experimentation platforms operating at enterprise scale.
  • Own the design, delivery, and evolution of a Kubernetes‑first training and experimentation platform, including Kubernetes‑native support for batch and distributed training jobs, lifecycle management, retry semantics, and failure recovery patterns.
  • Standardize AI developer workflows for experimentation, enabling self‑service job submission, reusable templates and golden paths, reproducibility mechanisms, and consistent runtime behavior across hybrid deployment environments.
  • Build and evolve platform APIs and automation, including Kubernetes controllers and operators where appropriate, to ensure the platform is safe, scalable, and easy to adopt across teams.
  • Drive measurable improvements in GPU availability and utilization through reliability engineering, fleet readiness patterns, and accelerated capacity onboarding.
  • Define and implement governance‑based scheduling and placement strategies, including:

Multi‑tenant GPU quotas and guardrails,

Priority, admission control, and reservation patterns,

Preemption policies,

Fragmentation reduction and topology‑aware placement (GPU type, MIG, and topology awareness)

  • Embed enterprise‑grade security, risk, and control requirements into platform defaults, including IAM and RBAC controls, secrets management, audit logging, policy enforcement, network segmentation, and controlled change management.
  • Drive operational excellence by establishing SLIs and SLOs, managing error budgets, leading incident management practices, forecasting capacity, and delivering end‑to‑end platform observability across job lifecycles and GPU telemetry.
  • Act as the primary interface with senior leaders, stakeholders, and executives, driving alignment and consensus across competing priorities and complex initiatives.
  • Lead multiple engineering teams and managers, building a high‑performing organization with strong engineering standards, scalable operating models, and a culture of accountability and continuous improvement.
  • Champion the firm's culture of diversity, opportunity, inclusion, and respect.

Required qualifications, capabilities, and skills

  • 15+ years of engineering experience, including 8+ years of senior engineering leadership experience with responsibility for managing managers.
  • Demonstrated experience delivering platform products (beyond foundational infrastructure) with strong adoption, reliability, and operational maturity.
  • Experience developing and leading large, cross‑functional engineering teams within highly matrixed and complex enterprise environments.
  • Proven track record of leading complex initiatives supporting distributed system design, testing, and operational stability at scale.
  • Deep hands‑on expertise with Kubernetes‑based platforms, including:

Multi‑tenancy, RBAC, admission control, and network policy,

Multi‑cluster operations, upgrades, and cluster lifecycle management,

Controllers, operators (CRDs), and platform API design patterns

  • Experience supporting AI training and experimentation platforms, including:

Py Torch and distributed training concepts such as scaling, orchestration, and failure modes,

Ray or similar frameworks for distributed experimentation execution,

Familiarity with Slurm or equivalent HPC or batch schedulers and core concepts such as queues, fair‑share, reservations, and preemption

  • Understanding of modern AI inference stacks (for example, vLLM) and how serving constraints-latency, throughput, batching, KV cache behavior, and GPU memory limits-influence training and experimentation platform design.
  • Strong understanding of GPU infrastructure fundamentals, including NVIDIA ecosystem capabilities, health and telemetry signals, and scheduling and placement constraints.
  • Extensive practical experience with cloud‑native technologies and hybrid infrastructure environments spanning on‑premises and public cloud.
  • Experience hiring, developing, coaching, and retaining high‑performing engineering talent.

Preferred qualifications, capabilities, and skills

  • Experience operating large‑scale GPU fleets, including heterogeneous accelerator environments.
  • Experience delivering hybrid AI platforms across on‑premises infrastructure, public cloud, and specialized accelerated‑compute vendors.
  • Experience working at the code level within large‑scale distributed systems.
  • This position is subject to Section 19 of the Federal Deposit Insurance Act. As such, an employment offer for this position is contingent on JPMorgan Chase's review of criminal conviction history, including pretrial diversions or program entries.

ABOUT US

JPMorgan Chase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.

We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.

We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.

JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans

ABOUT THE TEAM

Our Global Technology Infrastructure group is a team of innovators who love technology as much as you do. Together, you'll use a disciplined, innovative and a business focused approach to develop a wide variety of high-quality products and solutions. You'll work in a stable, resilient and secure operating environment where you-and the products you deliver-will thrive.

High Risk Roles (HRR) are sensitive roles within the technology organization that require high assurance of the integrity of staff by virtue of 1) sensitive cybersecurity and technology functions they perform within systems or 2) information they receive regarding sensitive cybersecurity or technology matters. Users in these roles are subject to enhanced pre-hire screening which includes both criminal and credit background checks (as allowed by law). The enhanced screening will need to be successfully completed prior to commencing employment or assignment.

浏览量

0

申请点击

0

Mock Apply

0

收藏

0

关于JPMorgan Chase

JPMorgan Chase

JPMorgan Chase & Co. is an American multinational banking institution headquartered in New York City and incorporated in Delaware. It is the largest bank in the United States, and the world's largest bank by market capitalization as of 2025.

300,000+

员工数

New York City

总部位置

$500B

企业估值

评价

10条评价

3.8

10条评价

工作生活平衡

3.5

薪酬

4.0

企业文化

3.8

职业发展

3.2

管理层

2.8

68%

推荐率

优点

Good benefits and compensation

Supportive colleagues and environment

Flexible work arrangements

缺点

Long hours and heavy workload

Management issues and lack of direction

High stress and expectations

薪资范围

44个数据点

Junior/L3

Mid/L4

Senior/L5

Junior/L3 · Analytics Solutions Associate

1份报告

$139,000

年薪总额

基本工资

$107,000

股票

-

奖金

-

$139,000

$139,000

面试评价

4条评价

难度

3.0

/ 5

时长

14-28周

录用率

50%

体验

正面 25%

中性 75%

负面 0%

面试流程

1

Application Review

2

HR Screen

3

Hiring Manager Interview

4

In-person/Final Interview

5

Offer

常见问题

Behavioral/STAR

Past Experience

Culture Fit

Financial Knowledge

Case Study