Mistral AI

Leading company in the artificial intelligence industry

Applied AI, Evaluation Engineer

職種機械学習

経験ミドル級

勤務地Paris, France

勤務オンサイト

雇用正社員

掲載2ヶ月前

応募する

福利厚生

•無制限休暇

•健康保険

•通勤手当

•無料食事

•ジム補助

•育児休暇

必須スキル

Python

Machine Learning

LLM evaluation

Benchmarking

About Mistral

At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.

We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed to meet enterprise needs, whether on-premises or in cloud environments. Our offerings include le Chat, the AI assistant for life and work.

We are a dynamic, collaborative team passionate about AI and its potential to transform society.
Our diverse workforce thrives in competitive environments and is committed to driving innovation. Our teams are distributed between France, USA, UK, Germany and Singapore. We are creative, low-ego and team-spirited.

Join us to be part of a pioneering company shaping the future of AI. Together, we can make a meaningful impact. See more about our culture on https://mistral.ai/careers.

About The Job:

The Applied AI team is Mistral's customer-facing technical organization. We work directly with enterprise clients from pre-sales through implementation to deploy cutting-edge AI solutions that deliver measurable business impact. Our team combines deep ML expertise with strong customer engagement skills, operating like startup CTOs who own end-to-end project execution.

However, the AI graveyard is full of great ideas nobody could measure or prototypes that never made it to production. As a first Evaluation Engineer, you'll design the methodology, build the infrastructure, and define what "ready for production" means across verticals and use cases.

You will design and implement evaluation systems that help our customers understand model performance across their specific use cases, build robust evaluation infrastructure, and work closely with both research and customer-facing teams.

Research builds evals for frontier capabilities but customers don't care about MMLU scores. We need in Applied AI evals and frameworks for customer reality domain-specific, risk-aware, production-grade. The kind that tell you whether your medical summarization model will hallucinate drug interactions, or whether your legal assistant will invent case citations.

This role sits at the intersection of research, engineering, and solutions, you will play a critical cross role in measuring, understanding, and improving the capabilities of our models for our enterprise customers.

What you will do

Design and implement comprehensive evaluation frameworks to measure LLM capabilities across diverse customer use cases, including text generation, reasoning, code, and domain-specific applications
Build scalable evaluation infrastructure and pipelines that enable rapid, reproducible assessment of model performance
Develop novel evaluation methodologies to assess emerging capabilities or verticalized use cases (cybersecurity, finance, healthcare, etc.) and enable the Solutions (Deployment Strategist and Applied AI) on these topics.
Create custom evaluation suites tailored to enterprise customers' specific needs, working closely with them to understand their requirements and success criteria
Collaborate with research teams to translate evaluation insights into model improvements and training decisions
Partner with product teams to continuously improve our evaluation tooling based on customer feedback

How We Work in Applied AI:

We care about people and outputs.
What matters is what you ship, not the time you spend on it
Bureaucracy is where urgency goes to vanish. You talk to whoever you need to talk to. The best idea wins, whether it comes from a principal engineer or someone in their first week.
Always ask why. The best solutions come from deep understanding, not from copying what worked before
We say what we mean. Feedback is direct, timely, and given because we care.
No politics. Low ego, high standards.
We embrace an unstructured environment and find joy in it.

About you

You are fluent in English
3+ years of experience in ML evaluation, benchmarking for LLM or agentic systems
You have proven experience in AI or machine learning product implementation with APIs, back-end
You have deep understanding of concepts and algorithms underlying machine learning and LLMs
You have strong technical coding skills in Python
You hold strong communication skills with an ability to explain complex technical concepts in simple terms with technical and non-technical audiences

Ideally you have:

Contributions to open-source evaluation frameworks (e.g., LM Eval Harness, OpenAI Evals) or published research on LLM evaluation
Experience as a Customer Engineer, Forward Deployed Engineer, Sales Engineer, Solutions Architect or Technical Product Manager
Experience with ML frameworks (Py Torch, Hugging Face Transformers)

Benefits:

🏝️ PTO: The CDI contract will be a "Forfait 218 jours", corresponding to 25 days of holidays and on average 8 to 10 days of RTT days, and complete autonomy on working hours
⚕️ Health : Full health insurance coverage for you and your family
🚗 Transportation : We offer a €600 annual mobility allowance. This package covers 50% of your public transportation costs and includes the Sustainable Mobility Allowance (FMD), encouraging eco-friendly travel options such as cycling or carpooling.
🥕 Food : Swile meal vouchers with 10,83€ per worked day, incl 60% offered by company
🏀 Sport : Gymlib - sponsorship by Mistral of a significant part of the monthly fee (depending on the program you chose)
🐤 Parental policy : 4 additional weeks for parents on top of what is offered by the French state.

閲覧数

応募クリック

Mock Apply

スクラップ

類似の求人

Alternance - Industrialisation & Qualité Bijoux Fantaisie – Mode – Septembre 2026 – H/F/X

Chanel · Paris

Alternance – Assistant(e) Ingénieur Qualité et Réglementaire - Mode - Septembre 2026 - H/F/X

Chanel · Paris

Responsable commercial / Business Developer segment Electricien Région Ile de France (H/F)

Illinois Tool Works · Paris

Ingénieur logiciel C/C++ F/H

LexisNexis (RELX) · Paris

Forward Deployed Engineer, Applied AI

Anthropic · Paris, France

Mistral AIについて

Mistral AI

Series B

Mistral AI SAS is a French artificial intelligence (AI) company, headquartered in Paris. Founded in 2023, it has open-weight large language models (LLMs), with both open-source and proprietary AI models. As of 2025 the company has a valuation of more than US$14 billion.

51-200

従業員数

Paris

本社所在地

$6.0B

企業価値

レビュー

10件のレビュー

3.8

10件のレビュー

ワークライフバランス

2.8

報酬

4.0

企業文化

4.2

キャリア

3.5

経営陣

2.5

72%

知人への推奨率

良い点

Supportive team and collaborative environment

Good learning opportunities and mentorship programs

Excellent benefits and competitive compensation

改善点

Poor management and lack of leadership direction

Work-life balance issues and heavy workload

High stress environment and burnout

給与レンジ

36件のデータ

Mid/L4

Senior/L5

Staff/L6

Mid/L4 · AI SCIENTIST

8件のレポート

$300,000

年収総額

基本給

$300,000

ストック

ボーナス

$390,000

$429,000

面接レビュー

レビュー1件

難易度

3.0

/ 5

期間

21-35週間

面接プロセス

Application Review

Recruiter Screen

Technical Interview

Research Discussion

Team Matching

Offer

よくある質問

Machine Learning/AI Concepts

Research Experience

Technical Knowledge

Coding/Algorithm

Behavioral/STAR