
Research Engineer (LLM Training and Performance)
必备技能
AWS
Kubernetes
PyTorch
GCP
Azure
At Jet Brains, code is our passion. Ever since we started back in 2000, we have been striving to make the strongest, most effective developer tools on earth. By automating routine checks and corrections, our tools speed up production, freeing developers to grow, discover, and create.
We’re looking for a Research Engineer who will own the training stack and model architecture for our Mellum LLM family. Your job is easier said than done: make training faster, cheaper, and more stable at a large scale. You’ll profile, design, and implement changes to the training pipeline – from architecture to custom GPU kernels, as needed.
As part of our team, you will:
-
Be responsible for improving end-to-end performance for multi-node LLM pre-training and post-training pipelines.
-
Profile hotspots (Nsight Systems/Compute, NVTX) and fix them using compute/comm overlap, kernel fusion, scheduling, etc.
-
Design and evaluate architecture choices (depth/width, attention variants including GQA/MQA/MLA/Flash-style, RoPE scaling/NTK, and MoE routing and load-balancing).
-
Implement custom ops (Triton and/or CUDA C++), integrate via Py Torch extensions, and upstream when possible.
-
Push memory/perf levers: FSDP/ZeRO, activation checkpointing, FP8/TE, tensor/pipeline/sequence/expert parallelism, NCCL tuning.
-
Harden large runs by building elastic and fault-tolerant training setups, ensuring robust checkpointing, strengthening reproducibility, and improving resilience to preemption.
-
Keep the data path fast using streaming and sharded data loaders and tokenizer pipelines, as well as improve overall throughput and cache efficiency.
-
Define the right metrics, build dashboards, and deliver steady improvements.
-
Run both pre-training and post-training (including SFT, RLHF, and GRPO-style methods) efficiently across sizable clusters.
We’ll be happy to bring you on board if you have:
-
Strong Py Torch and Py Torch Distributed experience, having run multi-node jobs with tens to hundreds of GPUs.
-
Hands-on experience with Megatron-LM/Megatron-Core/Ne Mo, Deep Speed, or serious FSDP/ZeRO expertise.
-
Real profiling expertise (Nsight Systems/Compute, nvprof) and experience with NVTX-instrumented workflows.
-
GPU programming skills with Triton and/or CUDA, and the ability to write, test, and debug kernels.
-
A solid understanding of NCCL collectives, as well as topology and fabric effects (IB/RoCE), and how they show up in traces.
Our ideal candidate would have experience with:
-
Flash Attention-2 and 3, CUTLASS and Cu Te, Transformer Engine and FP8, Inductor, AOTAutograd, and torch.compile.
-
MoE at scale (expert parallel, router losses, capacity management) and long-context tricks (ALi Bi/YaRN/NTK scaling).
-
Kubernetes or SLURM at scale, placement and affinity tuning, as well as AWS, GCP, and Azure GPU fleets.
-
Web-scale data plumbing (streaming datasets, Parquet and TFRecord, tokenizer perf), eval harnesses, and benchmarking.
-
Safety and post-training methods, such as DPO, ORPO, GRPO, and reward models.
-
Inference ecosystems such as vLLM and paged KV.
We process the data provided in your job application in accordance with the Recruitment Privacy Policy.
浏览量
0
申请点击
0
Mock Apply
0
收藏
0
相似职位
Crypto & Security Engineers
NXP Semiconductors · Glasgow; Gratkorn

CaMS Analyst (Part-Time)
DHL · Bracknell, England, United Kingdom

2026 BNY Analyst Program - Engineering Data Science (Manchester)
BNY Mellon · Greater Manchester, United Kingdom

Data Scientist - Post Sales
Sardine · UK - Remote

Data Scientist, Integrity Measurement
OpenAI · London, UK
关于JetBrains

JetBrains
BootstrappedJetBrains s.r.o. is a global software development private limited company which makes tools for software developers and project managers. The company has its headquarters in Amsterdam, and has offices in China, Europe, and the United States.
1,001-5,000
员工数
Prague
总部位置
评价
10条评价
4.3
10条评价
工作生活平衡
3.8
薪酬
4.2
企业文化
4.5
职业发展
3.7
管理层
3.5
78%
推荐率
优点
Great team and collaborative culture
Excellent benefits and competitive compensation
Flexible work arrangements and remote options
缺点
Heavy workload and overwhelming demands
Communication issues and lack of transparency
High stress from expectations and fast pace
薪资范围
2个数据点
Mid/L4
Mid/L4 · Market Analyst
2份报告
$169,300
年薪总额
基本工资
$130,000
股票
-
奖金
-
$116,300
$169,300
最新动态
Inline agent mode in preview and more in GitHub Copilot for JetBrains IDEs - The GitHub Blog
The GitHub Blog
News
·
1w ago
News | JetBrains signs Berlin's biggest lease of the year - CoStar
CoStar
News
·
1w ago
AI Adoption Lags in CI/CD Pipelines - Let's Data Science
Let's Data Science
News
·
1w ago
IntelliJ IDEA 2026.1.1 released with several bug fixes and improvements - Neowin
Neowin
News
·
2w ago