
Where the world builds software.
Staff Applied Researcher, AI Quality
**## About Git
Hub**
GitHub is the world’s leading platform for agentic software development — powered by Copilot to build, scale, and deliver secure software. Over 180 million developers, including more than 90% of the Fortune 100 companies, use GitHub to collaborate, and more than 77,000 organisations have adopted GitHub Copilot.
Locations
In this role you can work from Remote, United States
Overview
At GitHub, we’re building the next generation of AI‑powered developer experiences. We’re looking for a Staff Applied Researcher with deep expertise in Large Language Model (LLM) evaluation, LLM agents, strong engineering instincts, and a bias for action to help shape the future of GitHub Copilot and our AI platform.
This is a high‑impact role where you will design evaluation systems that directly influence how millions of developers experience AI every day.
Responsibilities Lead Model Quality & Evaluation
-
Design next‑generation evaluation frameworks for code generation, reasoning, safety, multimodal tasks, and agentic workflows.
-
Develop scalable automatic metrics, LLM‑judge systems, reward models, and human‑in‑the‑loop evaluation pipelines.
-
Establish high‑signal, repeatable methodologies that influence product decisions across GitHub AI.
Drive Applied Research & Engineering
-
Build and optimize evaluation tooling, datasets, benchmarking systems, and experimentation pipelines.
-
Create and onboard new benchmarks for the hardest tasks for the coding agents.
-
Collaborate closely with engineering teams to productionize research, validate improvements, and accelerate model iteration cycles.
-
Own end‑to‑end quality insights for the models behind GitHub Copilot and new AI features.
-
Work closely with product development, engineering, and design teams to integrate advanced research findings into practical applications, ensuring alignment with product goals and user needs.
Influence, Mentor & Lead
-
Shape GitHub’s strategy for model quality, alignment, and evaluation.
-
Mentor other researchers and engineers, helping elevate technical standards across the organization.
-
Drive clarity in ambiguous problem spaces and champion fast, high‑quality execution.
Qualifications Required Qualifications
-
Bachelor's degree in Data Science, Mathematics, Physics, Statistics, Economics, Operations Research, Computer Science, or related field AND 8+ years' experience in data science (e.g., managing structured and unstructured data, applying statistical techniques) or related field,OR master's degree in Data Science, Mathematics, Physics, Statistics, Economics, Operations Research, Computer Science, or related field AND 6+ years' experience in data science (e.g., managing structured and unstructured data, applying statistical techniques) or related field,
-
OR doctorate in Data Science, Mathematics, Physics, Statistics, Economics, Operations Research, Computer Science, or related field AND 4+ years' experience in data science (e.g., managing structured and unstructured data, applying statistical techniques) or related field,
-
OR equivalent experience.
-
3+ years of strong engineering skills in Python/Typescript and experience building production grade evaluation or data/ML pipelines at scale.
-
Proven track record shipping research or evaluation systems in production environments.
-
Strong cross‑functional communication and influence skills.
Preferred Qualifications
-
Experience with LLM judge systems, reward modeling, alignment, or safety evaluations.
-
Background in code generation, developer tools, or AI‑assisted programming.
-
Experience with large‑scale experimentation and online/offline evaluation strategies.
-
Open‑source contributions or experience working with developer communities.
-
Experience designing and leading complex research projects from ideation to implementation
-
Ability to define and articulate data-driven strategies that consider cross-functional impacts and align with organizational priorities, particularly in a software development platform context
Compensation Range
The base salary range for this job is USD $140,400.00 - USD $372,300.00 /Yr.
These pay ranges are intended to cover roles based across the United States. An individual's base pay depends on various factors including geographical location and review of experience, knowledge, skills, abilities of the applicant. At GitHub certain roles are eligible for benefits and additional rewards, including annual bonus and stock. These rewards are allocated based on individual impact in role. In addition, certain roles also have the opportunity to earn sales incentives based on revenue or utilization, depending on the terms of the plan and the employee's role.
GitHub values
-
Customer-obsessed
-
Ship to learn
-
Growth mindset
-
Own the outcome
-
Better together
-
Diverse and inclusive
Manager fundamentals
-
Model
-
Coach
-
Care
Leadership principles
-
Create clarity
-
Generate energy
-
Deliver success
Who We Are
GitHub is the world’s leading AI-powered developer platform with 150 million developers and counting. We’re also home to the biggest open-source community on earth (and 99% of the world’s software has open-source code in its DNA). Many of the apps and programs you use every day are built on GitHub.
Our teams are dreamers, doers, and pioneers, leading the way in AI, driving humanitarian efforts around the globe, and even sending open source to Mars (and beyond!).
At GitHub, our goal is to create the space you need to do your best work. We’re remote-first and offer competitive pay, generous learning and growth opportunities, and excellent benefits to support you, wherever you are—because we know that people flourish when they can work on their own terms.
Join us, and let’s change the world, together.
EEO Statement
GitHub is made up of people from a wide variety of backgrounds and lifestyles. We embrace diversity and invite applications from people of all walks of life. We don't discriminate against employees or applicants based on gender identity or expression, sexual orientation, race, religion, age, national origin, citizenship, disability, pregnancy status, veteran status, or any other differences. Also, if you have a disability, please let us know if there's any way we can make the interview process better for you; we're happy to accommodate!
Lead Model Quality & Evaluation
-
Design next‑generation evaluation frameworks for code generation, reasoning, safety, multimodal tasks, and agentic workflows.
-
Develop scalable automatic metrics, LLM‑judge systems, reward models, and human‑in‑the‑loop evaluation pipelines.
-
Establish high‑signal, repeatable methodologies that influence product decisions across GitHub AI.
Drive Applied Research & Engineering
-
Build and optimize evaluation tooling, datasets, benchmarking systems, and experimentation pipelines.
-
Create and onboard new benchmarks for the hardest tasks for the coding agents.
-
Collaborate closely with engineering teams to productionize research, validate improvements, and accelerate model iteration cycles.
-
Own end‑to‑end quality insights for the models behind GitHub Copilot and new AI features.
-
Work closely with product development, engineering, and design teams to integrate advanced research findings into practical applications, ensuring alignment with product goals and user needs.
Influence, Mentor & Lead
-
Shape GitHub’s strategy for model quality, alignment, and evaluation.
-
Mentor other researchers and engineers, helping elevate technical standards across the organization.
-
Drive clarity in ambiguous problem spaces and champion fast, high‑quality execution.
Required Qualifications
-
Bachelor's degree in Data Science, Mathematics, Physics, Statistics, Economics, Operations Research, Computer Science, or related field AND 8+ years' experience in data science (e.g., managing structured and unstructured data, applying statistical techniques) or related field,OR master's degree in Data Science, Mathematics, Physics, Statistics, Economics, Operations Research, Computer Science, or related field AND 6+ years' experience in data science (e.g., managing structured and unstructured data, applying statistical techniques) or related field,
-
OR doctorate in Data Science, Mathematics, Physics, Statistics, Economics, Operations Research, Computer Science, or related field AND 4+ years' experience in data science (e.g., managing structured and unstructured data, applying statistical techniques) or related field,
-
OR equivalent experience.
-
3+ years of strong engineering skills in Python/Typescript and experience building production grade evaluation or data/ML pipelines at scale.
-
Proven track record shipping research or evaluation systems in production environments.
-
Strong cross‑functional communication and influence skills.
Preferred Qualifications
-
Experience with LLM judge systems, reward modeling, alignment, or safety evaluations.
-
Background in code generation, developer tools, or AI‑assisted programming.
-
Experience with large‑scale experimentation and online/offline evaluation strategies.
-
Open‑source contributions or experience working with developer communities.
-
Experience designing and leading complex research projects from ideation to implementation
-
Ability to define and articulate data-driven strategies that consider cross-functional impacts and align with organizational priorities, particularly in a software development platform context
전체 조회수
0
전체 지원 클릭
0
전체 Mock Apply
0
전체 스크랩
0
비슷한 채용공고

Senior Machine Learning Engineer (REMOTE)
SailPoint · United States

Senior AI/ML Engineer
Ford · United States, US

Senior Data and Applied Scientist
Microsoft · United States, California, Mountain View; United States, Washington, Redmond

Principal AI/LLM Agent Architect
Oracle · United States, US

Senior Geospatial AI/ML Engineer
Planet Labs · United States, Remote
GitHub 소개

GitHub
Series BGitHub is a proprietary developer platform that allows developers to create, store, manage, and share their code.
501-1,000
직원 수
San Francisco
본사 위치
$7.5B
기업 가치
리뷰
10개 리뷰
4.2
10개 리뷰
워라밸
3.2
보상
4.0
문화
4.3
커리어
3.7
경영진
4.0
78%
지인 추천률
장점
Great team culture and teamwork
Learning and growth opportunities
Flexible work arrangements and remote options
단점
Work-life balance challenges and long hours
High expectations and overwhelming workload
Limited career advancement and mentorship
연봉 정보
22개 데이터
Junior/L3
L3
Junior/L3 · Data Scientist
0개 리포트
$150,000
총 연봉
기본급
$100,000
주식
$50,000
보너스
-
$127,500
$172,500
면접 후기
후기 3개
난이도
3.3
/ 5
소요 기간
14-28주
합격률
33%
경험
긍정 33%
보통 67%
부정 0%
면접 과정
1
Application Review
2
Recruiter Screen
3
Technical Phone Screen
4
Onsite/Virtual Interviews
5
Team Matching
6
Offer
자주 나오는 질문
Coding/Algorithm
System Design
Behavioral/STAR
Technical Knowledge
Culture Fit
최근 소식
Wiz hands GitHub AI-aided bug report that isn't total slop - theregister.com
theregister.com
News
·
1w ago
Critical GitHub RCE bug exposed millions of repositories - InfoWorld
InfoWorld
News
·
1w ago
GitHub rushed to fix a critical vulnerability in less than six hours - The Verge
The Verge
News
·
1w ago
Hashicorp co-founder Mitchell Hashimoto says GitHub ‘no longer a place for serious work’ - theregister.com
theregister.com
News
·
1w ago