
Research Engineer
About the role
Who We Are
Lightning AI is the company behind Py Torch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas from research to production with less friction.
Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.
We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.
Our Values
-
Move Fast: We act with speed and precision, breaking down big challenges into achievable steps.
-
Focus: We complete one goal at a time with care, collaborating as a team to deliver features with precision.
-
Balance: Sustained performance comes from rest and recovery. We ensure a healthy work-life balance to keep you at your best.
-
Craftsmanship: Innovation through excellence. Every detail matters, and we take pride in mastering our craft.
-
Minimal: Simplicity drives our innovation. We eliminate complexity through discipline and focus on what truly matters.
What We're Looking For
We are seeking a highly skilled Research Engineer to work on optimizing training and inference workloads on compute accelerators and clusters, through the Lightning Thunder compiler and the broader Py Torch Lightning ecosystem. This role sits at the intersection of deep learning research and large-scale system optimization. You’ll be shaping technology that pushes the boundaries of model performance and efficiency, creating foundational software that will impact the entire machine learning ecosystem.
This role is based in one of our hubs (NYC, SF, Seattle, or London), with a minimum of 2 in-office days per week, and attendance to occasional team/company offsites.
What You'll Do
- Develop performance-oriented model optimizations at multiple levels:
- Graph-level (e.g., operator fusion, kernel scheduling, memory planning)
- Kernel-level (CUDA, Triton, custom operators for specialized hardware)
- System-level (distributed training across GPUs/TPUs, inference serving at scale)
- Work across the software stack to ensure optimizations are accessible to end users through clean APIs, automated tooling, and seamless integration with our open-source ecosystem.
- Design and implement profiling and debugging tools to analyze model execution, identify bottlenecks, and guide optimization strategies.
- Collaborate with hardware vendors and ecosystem partners to ensure efficiency across diverse backends (NVIDIA, AMD, TPU, specialized accelerators).
- Contribute to open-source projects by developing new features, improving documentation, and supporting community adoption.
- Engage with researchers and engineers in the community and customers, providing guidance on performance tuning in ML workflows.
- Work cross-functionally with Lightning’s product and engineering teams to ensure optimization improvements align with the broader product vision
What You’ll Need Required Qualifications
- Strong expertise with deep learning frameworks such as Py Torch
- Hands-on experience with model optimization techniques, including graph-level optimizations, quantization, speculative decoding, pruning, mixed precision, or memory-efficient training
- Knowledge of distributed systems and parallelism strategies (data/model/pipeline parallelism, checkpointing, elastic scaling)
- Familiarity with software engineering practices: designing APIs, building robust tooling, testing, CI/CD for performance-sensitive systems
- Excellent collaboration and communication skills, with the ability to partner across research, engineering, and external contributors
- Bachelor’s degree in Computer Science, Engineering, or a related field
Nice-to-Haves
- Experience with CUDA, Triton, or other GPU programming models for developing custom kernels
- Proven track record contributing to open-source projects in ML, HPC, or compiler domains
- Advanced degree (Master’s or PhD) in AI, machine learning, infrastructure, or systems highly preferred
Compensation,Benefits, and Perks
We are committed to offering competitive compensation that reflects the value each team member brings to our mission. Final offers are based on factors such as experience, skills, geographic location, and role expectations. In addition to base salary, our total rewards package for eligible roles includes a discretionary bonus, a meaningful equity component, and comprehensive benefits.
We offer a comprehensive and competitive benefits package designed to support our employees’ health, well-being, and long-term success. Benefits may vary by location, team, and role.
Benefits include:
- Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
- Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
- Generous paid time off, plus holidays
- Paid parental leave
- Professional development support
- Wellness and work-from-home stipends
- Flexible work environment
At Lightning AI, we are committed to fostering an inclusive and diverse workplace. We believe that diverse teams drive innovation and create better products. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic. We are dedicated to building a culture where everyone can thrive and contribute to their fullest potential.
We are committed to offering competitive compensation that reflects the value each team member brings to our mission. Final offers are based on factors such as experience, skills, geographic location, and role expectations. In addition to base salary, our total rewards package for eligible roles includes a discretionary bonus, a meaningful equity component, and comprehensive benefits.
The anticipated annual base salary range for this role is:
$120,000—$250,000 USD
Benefits and Perks
We offer a comprehensive and competitive benefits package designed to support our employees’ health, well-being, and long-term success. Benefits may vary by location, team, and role.
Benefits include:
- Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
- Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
- Generous paid time off, plus holidays
- Paid parental leave
- Professional development support
- Wellness and work-from-home stipends
- Flexible work environment
At Lightning AI, we are committed to fostering an inclusive and diverse workplace. We believe that diverse teams drive innovation and create better products. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic. We are dedicated to building a culture where everyone can thrive and contribute to their fullest potential.
Required skills
PyTorch
Machine Learning
About Lightning AI
London
Headquarters