採用
必須スキル
Machine Learning
Computer Vision
NLP
Vision-Language Models
Model Fine-tuning
Distributed Training
PyTorch
Multimodal Generative AI Researcher
Location: Remote
About the Role
We’re looking for a Research Scientist with deep expertise in **training and fine-tuning large Vision-Language and Language Models (VLMs / LLMs)**for downstream multimodal tasks. You’ll help push the next frontier of models that reason across vision, language, and 3D, bridging research breakthroughs with scalable engineering.
What You’ll Do
-
Design and fine-tune large-scale VLMs / LLMs — and hybrid architectures — for tasks such as visual reasoning, retrieval, 3D understanding, and embodied interaction.
-
Build robust, efficient training and evaluation pipelines (data curation, distributed training, mixed precision, scalable fine-tuning).
-
Conduct in-depth analysis of model performance: ablations, bias / robustness checks, and generalisation studies.
-
Collaborate across research, engineering, and 3D / graphics teams to bring models from prototype to production.
-
Publish impactful research and help establish best practices for multimodal model adaptation.
What You Bring
-
PhD (or equivalent experience) in Machine Learning, Computer Vision, NLP, Robotics, or Computer Graphics.
-
Proven track record in fine-tuning or training large-scale VLMs / LLMs for real-world downstream tasks.
-
Strong engineering mindset — you can design, debug, and scale training systems end-to-end.
-
Deep understanding of multimodal alignment and representation learning (vision–language fusion, CLIP-style pre-training, retrieval-augmented generation).
-
Familiarity with recent trends, including video-language and long-context VLMs,spatio-temporal grounding,agentic multimodal reasoning, and Mixture-of-Experts (MoE) fine-tuning.
-
Awareness of 3D-aware multimodal models — using NeRFs, Gaussian splatting, or differentiable renderers for grounded reasoning and 3D scene understanding.
-
Hands-on experience with Py Torch / Deep Speed / Ray and distributed or mixed-precision training.
-
Excellent communication skills and a collaborative mindset.
Bonus / Preferred
-
Experience integrating 3D and graphics pipelines into training workflows (e.g., mesh or point-cloud encoding, differentiable rendering, 3D VLMs).
-
Research or implementation experience with vision-language-action models,world-model-style architectures, or multimodal agents that perceive and act.
-
Familiarity with efficient adaptation methods — LoRA, adapters, QLoRA, parameter-efficient finetuning, and distillation for edge deployment.
-
Knowledge of video and 4D generation trends,latent diffusion / rectified flow methods, or multimodal retrieval and reasoning pipelines.
-
Background in GPU optimisation, quantisation, or model compression for real-time inference.
-
Open-source or publication track record in top-tier ML / CV / NLP venues.
Equal Employment Opportunity:
We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.
総閲覧数
0
応募クリック数
0
模擬応募者数
0
スクラップ
0
類似の求人

AIML - Machine Learning Researcher, Foundation Models
Apple · Cupertino, CA

Applied Researcher II
Capital One · 4 Locations

AI Research Scientist - Multimodal Intelligence
Apple · Sunnyvale, CA

Machine Learning Engineer, Prediction & Planning
Waymo · Mountain View, CA, USA; San Francisco, CA, USA; New York City, NY, USA

AI/ML Engineer, Global Banking & Markets, Investment Banking
Goldman Sachs · Dallas, Texas, United States
Stability AIについて

Stability AI
Series AStability AI Ltd is a UK-based artificial intelligence company, best known for its text-to-image model Stable Diffusion.
51-200
従業員数
London
本社所在地
$1B
企業価値
レビュー
3.9
10件のレビュー
ワークライフバランス
3.2
報酬
4.0
企業文化
4.1
キャリア
3.5
経営陣
3.7
72%
友人に勧める
良い点
Flexible working hours
Supportive team and colleagues
Innovative and cutting-edge projects
改善点
Heavy and unpredictable workload
Long hours and fast-paced environment
Communication issues
給与レンジ
2件のデータ
Junior/L3
Junior/L3 · Recruiter
0件のレポート
$117,600
年収総額
基本給
$117,600
ストック
-
ボーナス
-
$99,960
$135,240
面接体験
41件の面接
難易度
4.2
/ 5
期間
21-35週間
内定率
27%
体験
ポジティブ 70%
普通 12%
ネガティブ 18%
面接プロセス
1
Recruiter Screen
2
ML Coding
3
ML System Design
4
Research Discussion
5
Team Interviews
よくある質問
ML fundamentals
Design an ML system
Research paper discussion
Statistical concepts
ニュース&話題
Bank of England to test the risk AI poses to country's financial stability - MSN
MSN
News
·
4d ago
Bank of England to test the risk AI poses to country's financial stability - as Governor warns of Anthropic cyber threat - This is Money
This is Money
News
·
4d ago
Finance leaders in Washington issue stark warning on AI cyber threats to financial stability - capacityglobal.com
capacityglobal.com
News
·
4d ago
Anthropic's Mythos AI sparks UK bank cyber stability alarm - SecurityBrief UK
SecurityBrief UK
News
·
4d ago