採用

Distributed Training & Performance Engineer - Vice President
New York, NY, United States, US
·
On-site
·
Full-time
·
2mo ago
必須スキル
PyTorch
Are you looking for an exciting opportunity to join a dynamic and growing team in a fast paced and challenging area? This is a unique opportunity for you to work with Global Technology Applied Research (GTAR) center at JPMorgan Chase. The goal of GTAR is to design and conduct research across multiple frontier technologies, in order to enable novel discoveries and inventions, and to inform and develop next-generation solutions for the firm’s clients and businesses.
As a senior-level engineer within The Global Technology Applied Research (GTAR), you will design, optimize, and scale large-model pretraining workloads across hyperscale accelerator clusters. This role sits at the intersection of distributed systems, kernel-level performance engineering, and large-scale model training. The ideal candidate can take a fixed hardware budget (accelerator type, node topology, interconnect, and cluster size) and design efficient, stable, and scalable training strategy, spanning parallelism layout, memory strategy, kernel optimization, and end-to-end system performance. This is a hands-on role with direct impact on training throughput, efficiency, and cost at scale.
Job responsibilities
- Design and optimize distributed training strategies for large-scale models, including data, tensor, pipeline, context parallelism.
- Manage end-to-end training performance: from data input pipelines through model execution, communication, and checkpointing.
- Identify and eliminate performance bottlenecks using systematic profiling and performance modeling.
- Develop or optimize high-performance kernels using CUDA, Triton, or equivalent frameworks.
- Design and optimize distributed communication strategies to maximize overlap between computation and inter-node data movement.
- Design memory-efficient training configurations (caching, optimizer sharding, checkpoint strategies).
- Evaluate and optimize training on multiple accelerator platforms, including GPUs and non-GPU accelerators.
- Contribute towards incorporating performance improvements back to internal pipelines.
Required qualifications, capabilities, and skills
- Master’s degree with 3+ years of industry experiences, or Ph.D. degree with 1+ years of industry experience in computer science, physics, math, engineering or related fields.
- Engineering experience at top AI labs, HPC centers, chip vendors, or hyperscale ML infra teams.
- Strong experience designing and operating large-scale distributed training jobs across multinode accelerator clusters.
- Deep understanding of distributed parallelism strategies: data parallelism, tensor/model parallelism, pipeline parallelism, and memory/optimizer sharding.
- Proven ability to profile and optimize training performance using industry standard tools such as Nsight, Py Torch profiler, or equivalent.
- Hands-on experience with GPU programming and kernel optimization.
- Strong understanding of accelerator memory hierarchies, bandwidth limitations, and compute-communication tradeoffs.
- Experience with collective communication libraries and patterns (e.g., NCCL-style collectives).
- Proficiency in Python for ML systems development and C++ for performance-critical components.
- Experience with modern ML frameworks such as Py Torch or JAX in large-scale training settings.
Preferred qualifications, capabilities, and skills
- Experience optimizing training workloads on non-GPU accelerators (e.g., TPU, or wafer-scale architectures).
- Familiarity with compiler-driven ML systems (e.g., XLA, MLIR, Inductor) and graph-level optimizations.
- Experience designing custom fused kernels or novel execution strategies for attention or large matrix operations.
- Strong understanding of scaling laws governing large-model pretraining dynamics and stability considerations.
- Contributions to open-source ML systems, distributed training frameworks, or performance-critical kernels.
- Prior experience collaborating directly with hardware vendors or accelerator teams.
総閲覧数
0
応募クリック数
0
模擬応募者数
0
スクラップ
0
類似の求人

Vice President, Release Train Engineer
BNY Mellon · New York, NY, United States

Risk Engineering, Vice President, Market Risk Strats, New York
Goldman Sachs · New York, New York, United States

Senior Vice President, Engineering - Hearst Magazines
Hearst · New York, NY, United States, US

Enterprise Architect, Aladdin Engineering, Vice President
BlackRock · New York, NY

Senior Vice President, OPS Process Engineer
BNY Mellon · New York, NY, United States
JPMorgan Chaseについて

JPMorgan Chase
PublicJPMorgan Chase & Co. is an American multinational banking institution headquartered in New York City and incorporated in Delaware. It is the largest bank in the United States, and the world's largest bank by market capitalization as of 2025.
300,000+
従業員数
New York City
本社所在地
$500B
企業価値
レビュー
3.8
10件のレビュー
ワークライフバランス
3.5
報酬
4.0
企業文化
3.8
キャリア
3.2
経営陣
2.8
68%
友人に勧める
良い点
Good benefits and compensation
Supportive colleagues and environment
Flexible work arrangements
改善点
Long hours and heavy workload
Management issues and lack of direction
High stress and expectations
給与レンジ
44件のデータ
Mid/L4
Senior/L5
Mid/L4 · Applied AI ML Associate
2件のレポート
$188,500
年収総額
基本給
$145,000
ストック
-
ボーナス
-
$182,000
$195,000
面接体験
4件の面接
難易度
3.0
/ 5
期間
14-28週間
内定率
50%
体験
ポジティブ 25%
普通 75%
ネガティブ 0%
面接プロセス
1
Application Review
2
HR Screen
3
Hiring Manager Interview
4
In-person/Final Interview
5
Offer
よくある質問
Behavioral/STAR
Past Experience
Culture Fit
Financial Knowledge
Case Study
ニュース&話題
JPMorgan Chase & Co. (NYSE:JPM) Shares Down 1% - Here's Why - MarketBeat
MarketBeat
News
·
1d ago
JPMorganChase adding 400 jobs in Charlotte as part of consolidation - Queen City News
Queen City News
News
·
1d ago
SPONSORED How JPMorganChase is scaling support for small businesses - Axios
Axios
News
·
1d ago
JPMorgan Chase adds Matthews to its growing list of Charlotte-area branches - The Business Journals
The Business Journals
News
·
2d ago