招聘

AI Performance Optimization Engineer

Lightning AI

New York, New York, United States; San Francisco, California, United States

On-site

Full-time

1mo ago

必备技能

PyTorch

Machine Learning

Who We Are

Lightning AI is the company behind Py Torch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas from research to production with less friction.

Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.

We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.

Our Values Move Fast: We act with speed and precision, breaking down big challenges into achievable steps.

Focus: We complete one goal at a time with care, collaborating as a team to deliver features with precision.

Balance: Sustained performance comes from rest and recovery. We ensure a healthy work-life balance to keep you at your best.

Craftsmanship: Innovation through excellence. Every detail matters, and we take pride in mastering our craft.

Minimal: Simplicity drives our innovation. We eliminate complexity through discipline and focus on what truly matters.

What We're Looking For

We are seeking a highly skilled AI Optimization Engineer to work on optimizing training and inference workloads on compute accelerators and clusters, through the Lightning Thunder compiler and the broader Py Torch Lightning ecosystem. This role sits at the intersection of deep learning research, compiler development, and large-scale system optimization. You’ll be shaping technology that pushes the boundaries of model performance and efficiency, creating foundational software that will impact the entire machine learning ecosystem.

You will be joining the Engineering Team and report to our Tech Lead. This is a hybrid role based in either our New York City or San Francisco office with in-office requirements of 2 days per week. The salary range for this role is $120,000-$250,000.

What you’ll do

Develop performance-oriented model optimizations at multiple levels:
Graph-level (e.g., operator fusion, kernel scheduling, memory planning)
Kernel-level (CUDA, Triton, custom operators for specialized hardware)
System-level (distributed training across GPUs/TPUs, inference serving at scale)
Advance the Thunder compiler by building optimization passes, graph transformations, and integration hooks to accelerate training and inference workloads.
Work across the software stack to ensure optimizations are accessible to end users through clean APIs, automated tooling, and seamless integration with Py Torch Lightning.
Design and implement profiling and debugging tools to analyze model execution, identify bottlenecks, and guide optimization strategies.
Collaborate with hardware vendors and ecosystem partners to ensure Thunder runs efficiently across diverse backends (NVIDIA, AMD, TPU, specialized accelerators).
Contribute to open-source projects by developing new features, improving documentation, and supporting community adoption.
Engage with researchers and engineers in the community, providing guidance on performance tuning and advocating for Thunder as the go-to optimization layer in ML workflows.
Work cross-functionally with Lightning’s product and engineering teams to ensure compiler and optimization improvements align with the broader product vision.

What you’ll need

Strong expertise with deep learning frameworks such as Py Torch
Hands-on experience with model optimization techniques, including graph-level optimizations, quantization, pruning, mixed precision, or memory-efficient training.
Deep understanding of compiler internals (IR design, operator fusion, scheduling, optimization passes) or proven work in performance-critical software.
Experience with CUDA, Triton, or other GPU programming models for developing custom kernels.
Knowledge of distributed systems and parallelism strategies (data/model/pipeline parallelism, checkpointing, elastic scaling).
Familiarity with software engineering practices: designing APIs, building robust tooling, testing, CI/CD for performance-sensitive systems.
Proven track record contributing to open-source projects in ML, HPC, or compiler domains.
Excellent collaboration and communication skills, with the ability to partner across research, engineering, and external contributors.
Bachelor’s degree in Computer Science, Engineering, or a related field. Advanced degree (Master’s or PhD) in machine learning, compilers, or systems highly preferred.

Benefits and Perks

We offer competitive base salaries and equity with a 25% one year cliff and monthly vesting thereafter. For our international employees, we work with our EOR to pay you in your local currency and provide equitable benefits across the globe.

In the US, we offer:

Medical, dental and vision
Life and AD&D insurance
Flexible paid time off including winter closure
Paid family leave benefits
$500 monthly meal reimbursement, including groceries & food delivery services
$500 one time home office stipend
$1,000 annual learning & development stipend
100% Citibike membership (NYC only)
$45/month gym membership
Additional various medical and mental health services

At Lightning AI, we are committed to fostering an inclusive and diverse workplace. We believe that diverse teams drive innovation and create better products. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic. We are dedicated to building a culture where everyone can thrive and contribute to their fullest potential.

总浏览量

申请点击数

模拟申请者数

相似职位

Lighting Engineer - Gramercy

Live Nation · New York, NY, USA

Risk Analytics Developer- Commodities

Morgan Stanley · New York, New York, United States of America

Java Developer - Operations Technology

Morgan Stanley · New York, New York, United States of America

Forward Deployed Engineer

Turing · New York, New York, United States

Augmented and Virtual Reality Developer

Paramount · New York, NY, US, 10036

关于Lightning AI

Lightning AI

Series A

Lightning AI develops PyTorch Lightning, an open-source deep learning framework that simplifies machine learning model training and deployment. The company provides cloud infrastructure and tools for AI researchers and developers.

51-200

员工数

New York

总部位置

$200M

企业估值

评价

3.9

10条评价

工作生活平衡

4.0

薪酬

3.2

企业文化

4.1

职业发展

2.8

管理层

2.9

72%

推荐给朋友

优点

Great work-life balance

Supportive and collaborative team

Good benefits and flexible hours

缺点

Management issues and poor communication

Limited career advancement opportunities

Workload and stress concerns

新闻动态

Lightning AI Showcases Technical Expertise at PyTorch Conference Europe - TipRanks

TipRanks

News

1w ago

Lightning AI – Weekly Recap - TipRanks

TipRanks

News

3w ago

Lightning Round: Tempus AI is a 'decent spec', says Jim Cramer - CNBC

CNBC

News

3w ago

Potholes to poor lighting: AI Dashcams to detect 30 types of highway defects - The Times of India

The Times of India

News

4w ago