招聘

Evaluations Program Leader

Uber

Bangalore, India

On-site

Full-time

4d ago

About the Role

The Evaluations Program Leader will own the end-to-end strategy, design and execution of human evaluations for one or more of Uber's GenAI-powered products at a time. These include conversational AI, voice AI, agent workflows and auto-evaluation systems. This role sits within the Global Digital Experience team, the operational arm of Uber's customer support tech organisation, and is a critical driver of quality, safety, and performance across Uber's next-generation AI solutions.
This leader will build and scale Uber's Manual Evaluation framework: defining methodologies, creating evaluation rubrics, ensuring annotation quality, and generating the insights that shape model tuning, product improvements, and release decisions. They will partner closely with Product, Engineering, Data Science and Product Ops to understand product goals, build the pipeline for Manual Evaluations to achieve these - defining methodologies, creating evaluation rubrics, ensuring annotation quality - and translate evaluation outcomes into clear technical and operational actions to shape model tuning, product improvements, and release decisions.

The role includes both strategic leadership and operational execution. They will be responsible for setting the quality bar for evaluations, ensuring consistent delivery at scale, and driving continuous improvement of the evaluation pipeline.

The ideal candidate brings strong technical literacy in GenAI systems, exceptional program design and operational skills, and the ability to lead high-impact cross-functional initiatives. They are comfortable navigating ambiguity, building strong partnerships across Uber, and influencing product direction through rigorous evaluation insights. This is a rare opportunity to play a leading role in one of Uber's most transformative technology programs and help shape the future of Uber's AI-driven experiences.

What the Candidate Will Do:

Own the end-to-end strategy, design, and execution of Manual Evaluations for one or more of Uber's GenAI-powered products (chatbots, voice AI, automated workflows, autoeval systems)2. Develop and continuously improve evaluation methodologies, including rubrics, taxonomies, annotation guidelines, quality standards and success metrics3. Partner with Product, Engineering, Data Science and Product Operations to understand product goals and ensure human evaluations directly inform model tuning, safety improvements, product design changes, and release decisions, as well as scaled operations teams to deliver on time, at short notice and to a high quality standard4. Package insights into clear, actionable narratives and present them to cross-functional leaders, influencing product and operational strategy5. Lead evaluation projects across multiple AI products simultaneously**, ensuring timelines, quality and delivery expectations are met6. Establish processes and tools that scale, including workflow optimization, evaluator training, QA systems and feedback loops.7. Oversee a global manual evaluations operation, including indirect leadership of evaluators at multiple business sites and ongoing assessment of internal vs external resources to deliver the best evaluation outcomes**8. Serve as a subject-matter expert in human evaluation for GenAI, staying current with best practices in safety testing, multimodal evaluation and human-in-the-loop systems.
---- Basic Qualifications ----
Bachelors degree in engineering
5+ years of experience in program management, product operations, quality operations, research operations, or technical program leadership **in an AI-related environment.3. First hand experience with GenAI systems, LLM evaluation, model safety, failure pattern analysis, prompt evaluation, or AI product quality.4. Experience designing or running structured evaluation or quality frameworks, such as human labeling, annotation, audit workflows or manual review processes.5. Familiarity with evaluation methodologies(rubric design, taxonomies, annotation guidelines, reliability scoring, inter-rater agreement, etc.).6. Strong project management abilities, with experience running multiple complex programs simultaneously.**7. Proven experience managing outsourced teams to execute high-quality manual evaluation processes

---- Preferred Qualifications ----

Demonstrated ability to work cross-functionally **with Product, Engineering, Data Science, and Operations teams.2. Knowledge of automated evaluation systems, LLM-as-judge frameworks, or hybrid human+machine evaluation pipelines.3. Background in service design, conversational AI, voice UX, or agent workflows.4. Strong analytical and problem-solving skills, with experience turning ambiguous data into clear insights.**5. Excellent written and verbal communication skills, capable of translating technical evaluation outputs into business-relevant insights.
Experience in global operations, including scaling teams, training processes, and quality management across regions.

Uber's mission is to reimagine the way the world moves for the better. Here, bold ideas create real-world impact, challenges drive growth, and speed fuelds progress. What moves us, moves the world - let's move it forward, together.

Offices continue to be central to collaboration and Uber's cultural identity. Unless formally approved to work fully remotely, Uber expects employees to spend at least half of their work time in their assigned office. For certain roles, such as those based at green-light hubs, employees are expected to be in-office for 100% of their time. Please speak with your recruiter to better understand in-office expectations for this role.

Accommodations may be available based on religious and/or medical conditions, or as required by applicable law. To request an accommodation, please reach out to accommodations@uber.com.

Total Views

Apply Clicks

Mock Applicants

Scraps

Similar Jobs

Sr. Product Manager - LATAM Seller Experience, LATAM SX Affordability & Compliance

Amazon · Sao Paulo, SP, BRA

Product Manager - Vault Registrations

Veeva Systems · Pennsylvania - Radnor

Senior Analyst, Product Delivery

Mastercard · Bogota, Colombia

Product Manager, TV Native Ad Experiences

Samsung · 645 Clyde Avenue, Mountain View, CA, USA

Product Manager - Uganda

Visa · Kampala, Uganda

About Uber

Uber

Uber develops, markets, and operates a ride-sharing mobile application that allows consumers to submit a trip request.

10,001+

Employees

San Francisco

Headquarters

$120B

Valuation

Reviews

3.1

10 reviews

Work Life Balance

4.2

Compensation

2.3

Culture

3.5

Career

2.0

Management

2.5

45%

Recommend to a Friend

Pros

Flexible hours and schedule

Meeting different people and cultures

Make your own hours

Cons

Inconsistent and low pay

Safety concerns with passengers

Traffic and difficult drivers

Salary Ranges

23,534 data points

Junior/L3

Mid/L4

Principal/L7

Senior/L5

Staff/L6

Director

Junior/L3 · Associate Product Manager

0 reports

$153,422

total / year

Base

Stock

Bonus

$130,409

$176,435

Interview Experience

5 interviews

Difficulty

3.0

/ 5

Duration

14-28 weeks

Offer Rate

40%

Experience

Positive 80%

Neutral 20%

Negative 0%

Interview Process

Application Review

Online Assessment

Recruiter Screen

Technical Phone Screen

Case Study/Analytics Test

Final Loop/Panel Interview

Offer

Common Questions

Coding/Algorithm

System Design

Behavioral/STAR

Case Study

Technical Knowledge

News & Buzz

Uber Shares Slip 2% Ahead Of Q4 Earnings As Robotaxi Ties Draw Focus - Eudaimonia and Co

Source: Eudaimonia and Co

News

6w ago

Uber Eats Ordered to Pay $3.5 Million Over NYC Delivery Worker Pay - The Wall Street Journal

Source: The Wall Street Journal

News

6w ago

Mayor Mamdani Announces $5 Million Settlement, Reinstatement of as Many as 10,000 Wrongfully Deactivated Food Delivery Workers - NYC.gov

Source: NYC.gov

News

6w ago

TSD Mobility teams up with Uber for Business to bring on-demand rides directly into the dealership workflow - CBT News

Source: CBT News

News

6w ago