refresh

트렌딩 기업

트렌딩

채용

JobsKLA

AI/ML Engineer - MLOps/SRE

KLA

AI/ML Engineer - MLOps/SRE

KLA

Chennai, India

·

On-site

·

Full-time

·

1w ago

Benefits & Perks

Healthcare

401(k)

Equity

Paid Time Off

Parental Leave

Learning Budget

Mental Health

Healthcare

401k

Equity

Parental Leave

Learning

Mental Health

Required Skills

Python

PyTorch

TensorFlow

scikit-learn

CI/CD

Kubernetes

DevOps

Cloud platforms

Monitoring

Incident management

Company Overview

KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world’s leading technology providers to accelerate the delivery of tomorrow’s electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us.

Group/Division

The Information Technology (IT) group at KLA is involved in every aspect of the global business. IT’s mission is to enable business growth and productivity by connecting people, process, and technology. It focuses not only on enhancing the technology that enables our business to thrive but also on how employees use and are empowered by technology. This integrated approach to customer service, creativity and technological excellence enables employee productivity, business analytics, and process excellence.

Job Description/Preferred Qualifications

We are seeking a hands-on AI/ML Engineer specializing in MLOps and Site Reliability Engineering (SRE) to build, operate, and continuously improve production-grade machine learning systems. In this role, you will partner with data scientists, data engineers, and software teams to standardize the ML lifecycle, improve reliability and performance, and enable rapid, safe delivery of models and AI services at scale.

Key Responsibilities

  • Production ML Platform & Tooling

  • Design and implement reusable MLOps platform capabilities for training, deployment, and monitoring of ML/LLM systems.

  • Build standardized pipelines for data validation, feature generation, training, evaluation, model packaging, and release.

  • Own model registry, artifact storage, and metadata lineage to ensure reproducibility and auditability.

  • Deployment Engineering & Release Safety

  • Deploy models and AI services using containers and orchestration (e.g., Kubernetes) with robust rollout strategies (blue/green, canary, A/B).

  • Create CI/CD workflows for ML code and pipelines, including automated tests, quality gates, and approval controls.

  • Harden inference services for low latency and high throughput using caching, batching, autoscaling, and efficient model serving patterns.

  • Reliability, Observability & Incident Response (SRE)

  • Define and track service-level indicators (SLIs) and service-level objectives (SLOs) for ML services, pipelines, and data dependencies.

  • Implement end-to-end observability: structured logging, metrics, tracing, dashboards, and alerting for both infrastructure and model behavior.

  • Lead incident response and post-incident reviews; drive systemic fixes through runbooks, automation, and reliability engineering practices.

  • Model & Data Monitoring

  • Implement monitoring for model quality and data health: drift, bias, performance degradation, and data pipeline anomalies.

  • Build automated feedback loops to trigger investigations, retraining workflows, and safe rollback when quality thresholds are breached.

  • Security, Compliance & Governance

  • Integrate security best practices: secrets management, least-privilege access (RBAC), network controls, and vulnerability scanning.

  • Support compliance and governance requirements for model usage, data access, retention, and responsible AI practices.

  • Collaboration & Enablement

  • Partner with data science and engineering teams to translate business requirements into reliable, scalable ML solutions.

  • Create developer-friendly documentation, templates, and internal best practices; mentor teams on MLOps and reliability standards.

Required Qualifications

  • Bachelor's degree in Computer Science, Engineering, Data Science, or a related field with 5+ years of relevant experience; OR a Master's/PhD with 3+ years of relevant experience.

  • Proven experience deploying and operating ML models or AI services in production environments.

  • Strong programming skills in Python and experience with common ML libraries and frameworks (e.g., Py Torch, Tensor Flow, scikit-learn).

  • Hands-on DevOps/SRE experience: CI/CD, infrastructure as code, containerization, and operational excellence.

  • Experience with cloud platforms and managed services (Azure, AWS, or GCP) and building scalable, secure systems.

  • Experience with Kubernetes and modern model serving patterns (REST/gRPC, async workers, batch/stream inference).

  • Strong understanding of monitoring and observability (metrics, logs, traces) and incident management processes.

  • Ability to communicate clearly with both technical and non-technical stakeholders, and to operate effectively in cross-functional teams.

Preferred Qualifications

  • Experience with ML platform tools such as MLflow, Kubeflow, Airflow, Sage Maker, Vertex AI, or Azure Machine Learning.

  • Experience with feature stores, data quality frameworks, and dataset/versioning tools (e.g., Feast, Great Expectations, DVC).

  • Experience with distributed systems performance tuning (autoscaling, queueing, caching, load shedding).

  • Experience implementing model monitoring for drift, bias, and quality (e.g., Evidently, whylogs, custom statistical checks).

  • Knowledge of security and compliance patterns for enterprise AI (data classification, encryption, audit logging).

  • Contributions to open-source projects, publications, or demonstrated technical leadership through talks/blogs.

What Success Looks Like (First 6-12 Months)

  • Standardized CI/CD and deployment patterns for ML services that reduce time-to-production while improving safety and reliability.

  • Clear SLOs, dashboards, and alerts for critical AI services with measurable improvements in uptime, latency, and incident response.

  • Automated monitoring and quality checks that detect drift and data issues early, with repeatable remediation workflows.

  • Improved reproducibility and governance through consistent artifact tracking, lineage, and documentation.

Note: Technology choices may vary by team needs; candidates should be comfortable learning and adapting to new tools.

Minimum Qualifications

Doctorate (Academic) Degree and 0 years related work experience; Master's Level Degree and related work experience of 3 years; Bachelor's Level Degree and related work experience of 5 years

We offer a competitive, family friendly total rewards package. We design our programs to reflect our commitment to an inclusive environment, while ensuring we provide benefits that meet the diverse needs of our employees.

KLA is proud to be an equal opportunity employer

Be aware of potentially fraudulent job postings or suspicious recruiting activity by persons that are currently posing as KLA employees. KLA never asks for any financial compensation to be considered for an interview, to become an employee, or for equipment. Further, KLA does not work with any recruiters or third parties who charge such fees either directly or on behalf of KLA. Please ensure that you have searched KLA’s Careers website for legitimate job postings. KLA follows a recruiting process that involves multiple interviews in person or on video conferencing with our hiring managers. If you are concerned that a communication, an interview, an offer of employment, or that an employee is not legitimate, please send an email to talent.acquisition@kla.com to confirm the person you are communicating with is an employee. We take your privacy very seriously and confidentially handle your information.

Total Views

0

Apply Clicks

0

Mock Applicants

0

Scraps

0

About KLA

KLA

KLA

Public

KLA Corporation is an American company based in Milpitas, California that makes wafer fab equipment. It supplies process control and yield management systems for the semiconductor industry and other related nanoelectronics industries.

10,001+

Employees

Milpitas

Headquarters

Reviews

3.6

26 reviews

Work Life Balance

3.2

Compensation

3.8

Culture

3.6

Career

3.1

Management

3.4

65%

Recommend to a Friend

Pros

Good management and supportive leadership

Competitive compensation and benefits package

Strong engineering department and talented peers

Cons

Limited career growth and promotion opportunities

Work-life balance issues with long hours

Chaotic work environment with minimal direction

Salary Ranges

0 data points

L4

L5

L4 · Data Scientist

0 reports

$147,667

total / year

Base

-

Stock

-

Bonus

-

$125,532

$169,802

Interview Experience

14 interviews

Difficulty

2.9

/ 5

Duration

14-28 weeks

Offer Rate

57%

Experience

Positive 36%

Neutral 36%

Negative 28%

Interview Process

1

Application Review

2

Recruiter Screen

3

Technical Phone Screen

4

Technical Interview Rounds

5

Final Round/Onsite

6

Offer

Common Questions

Coding/Algorithm

Technical Knowledge

System Design

Behavioral/STAR

Past Experience