招聘
必备技能
Machine Learning
Computer Vision
NLP
Vision-Language Models
Model Fine-tuning
Distributed Training
PyTorch
Research Scientist – VLM Generalist
Location: Remote
About the Role
We’re looking for a Research Scientist with deep expertise in **training and fine-tuning large Vision-Language and Language Models (VLMs / LLMs)**for downstream multimodal tasks. You’ll help push the next frontier of models that reason across vision, language, and 3D, bridging research breakthroughs with scalable engineering.
What You’ll Do
-
Design and fine-tune large-scale VLMs / LLMs — and hybrid architectures — for tasks such as visual reasoning, retrieval, 3D understanding, and embodied interaction.
-
Build robust, efficient training and evaluation pipelines (data curation, distributed training, mixed precision, scalable fine-tuning).
-
Conduct in-depth analysis of model performance: ablations, bias / robustness checks, and generalisation studies.
-
Collaborate across research, engineering, and 3D / graphics teams to bring models from prototype to production.
-
Publish impactful research and help establish best practices for multimodal model adaptation.
What You Bring
-
PhD (or equivalent experience) in Machine Learning, Computer Vision, NLP, Robotics, or Computer Graphics.
-
Proven track record in fine-tuning or training large-scale VLMs / LLMs for real-world downstream tasks.
-
Strong engineering mindset — you can design, debug, and scale training systems end-to-end.
-
Deep understanding of multimodal alignment and representation learning (vision–language fusion, CLIP-style pre-training, retrieval-augmented generation).
-
Familiarity with recent trends, including video-language and long-context VLMs,spatio-temporal grounding,agentic multimodal reasoning, and Mixture-of-Experts (MoE) fine-tuning.
-
Awareness of 3D-aware multimodal models — using NeRFs, Gaussian splatting, or differentiable renderers for grounded reasoning and 3D scene understanding.
-
Hands-on experience with Py Torch / Deep Speed / Ray and distributed or mixed-precision training.
-
Excellent communication skills and a collaborative mindset.
Bonus / Preferred
-
Experience integrating 3D and graphics pipelines into training workflows (e.g., mesh or point-cloud encoding, differentiable rendering, 3D VLMs).
-
Research or implementation experience with vision-language-action models,world-model-style architectures, or multimodal agents that perceive and act.
-
Familiarity with efficient adaptation methods — LoRA, adapters, QLoRA, parameter-efficient finetuning, and distillation for edge deployment.
-
Knowledge of video and 4D generation trends,latent diffusion / rectified flow methods, or multimodal retrieval and reasoning pipelines.
-
Background in GPU optimisation, quantisation, or model compression for real-time inference.
-
Open-source or publication track record in top-tier ML / CV / NLP venues.
Equal Employment Opportunity:
We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.
总浏览量
1
申请点击数
0
模拟申请者数
0
收藏
0
相似职位

Data Scientist- TikTok Ads, Ads Targeting, Auction and Delivery
TikTok · San Jose, CA

Data Scientist - Tiktok Ads, Vertical Solutions
TikTok · San Jose, CA

Data Scientist
Leidos · McLean, VA

Data Scientist, WW Ops, FP&A, WW Ops FP&A
Amazon · Bellevue, WA, USA

Data Scientist, Marketing Innovation
OpenAI · San Francisco
关于Stability AI

Stability AI
Series AStability AI Ltd is a UK-based artificial intelligence company, best known for its text-to-image model Stable Diffusion.
51-200
员工数
London
总部位置
$1B
企业估值
评价
3.9
10条评价
工作生活平衡
3.2
薪酬
4.0
企业文化
4.1
职业发展
3.5
管理层
3.7
72%
推荐给朋友
优点
Flexible working hours
Supportive team and colleagues
Innovative and cutting-edge projects
缺点
Heavy and unpredictable workload
Long hours and fast-paced environment
Communication issues
薪资范围
2个数据点
Junior/L3
Junior/L3 · Recruiter
0份报告
$117,600
年薪总额
基本工资
$117,600
股票
-
奖金
-
$99,960
$135,240
面试经验
41次面试
难度
4.2
/ 5
时长
21-35周
录用率
27%
体验
正面 70%
中性 12%
负面 18%
面试流程
1
Recruiter Screen
2
ML Coding
3
ML System Design
4
Research Discussion
5
Team Interviews
常见问题
ML fundamentals
Design an ML system
Research paper discussion
Statistical concepts
新闻动态
Bank of England to test the risk AI poses to country's financial stability - MSN
MSN
News
·
4d ago
Bank of England to test the risk AI poses to country's financial stability - as Governor warns of Anthropic cyber threat - This is Money
This is Money
News
·
4d ago
Finance leaders in Washington issue stark warning on AI cyber threats to financial stability - capacityglobal.com
capacityglobal.com
News
·
4d ago
Anthropic's Mythos AI sparks UK bank cyber stability alarm - SecurityBrief UK
SecurityBrief UK
News
·
4d ago