채용

Staff SRE, Agentic AI

Netskope

Bengaluru, Karnataka, India

On-site

Full-time

1w ago

Required Skills

Python

Bash

Kubernetes

Docker

PyTorch

Hugging Face Transformers

Terraform

Prometheus

Grafana

Git

GitHub

About Netskope

Today, there's more data and users outside the enterprise than inside, causing the network perimeter as we know it to dissolve. We realized a new perimeter was needed, one that is built in the cloud and follows and protects data wherever it goes, so we started Netskope to redefine Cloud, Network and Data Security.

Since 2012, we have built the market-leading cloud security company and an award-winning culture powered by hundreds of employees spread across offices in Santa Clara, St. Louis, Bangalore, London, Paris, Melbourne, Taipei, and Tokyo. Our core values are openness, honesty, and transparency, and we purposely developed our open desk layouts and large meeting spaces to support and promote partnerships, collaboration, and teamwork. From catered lunches and office celebrations to employee recognition events and social professional groups such as the Awesome Women of Netskope (AWON), we strive to keep work fun, supportive and interactive. Visit us at Netskope Careers. Please follow us on LinkedIn and Twitter@Netskope.

About the role:

Please note, this team is hiring across all levels and candidates are individually assessed and appropriately leveled based upon their skills and experience.

As a SRE MLOps, you will be critical to deploying and managing cutting-edge infrastructure crucial for AI/ML operations, and you will collaborate with AI/ML engineers and researchers to develop a robust CI/CD pipeline that supports safe and reproducible experiments. Your expertise will also extend to setting up and maintaining monitoring, logging, and alerting systems to oversee extensive training runs and client-facing APIs. You will ensure that training environments are optimally available and efficiently managed across multiple clusters, enhancing our containerization and orchestration systems with advanced tools like Docker and Kubernetes.

What’s in it for you

You will be critical to deploying and managing cutting-edge infrastructure for AI/ML operations. This means you won't just maintain existing systems; you will be building the foundational technology that powers our next generation of intelligent products.

Your role is crucial to bridging the gap between research and production. If you thrive on solving complex distributed systems challenges and maximizing the efficiency of high-stakes AI workloads, this is the environment for you.

What you will be doing

Work closely with AI/ML engineers and researchers to participate in the designing and architecture of AI ML Applications for scale and reliability. Design and deploy a CI/CD pipeline that ensures safe and reproducible experiments.
Involve in production troubleshooting of AI ML Application code as well as infrastructure configurations.
Set up and manage monitoring, logging, and alerting systems for extensive training runs and client-facing APIs.
Ensure training environments are consistently available and prepared across multiple clusters.
Develop and manage containerization and orchestration systems utilizing tools such as Docker and Kubernetes.
Operate and oversee large Kubernetes clusters with GPU workloads.
Improve reliability, quality, and time-to-market of our suite of software solutions
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement
Provide primary operational support and engineering for multiple large-scale distributed software applications

Required skills and experience

8+ years of professional experience building core infrastructure systems.
Hands-on experience with core model training principles and major frameworks like Py Torch and Hugging Face Transformers
Familiarity with LLM development, deployment, and optimization techniques (e.g., TensorRT).
Familiarity with high-performance, large-scale ML systems and their unique infrastructure needs.
Experience with major cloud providers (Google Cloud, AWS, or Azure).
Proficiency with Infrastructure as Code (IaC) tools like Terraform.
Strong scripting skills using languages like Python or Bash, and experience with Git and GitHub workflows.
Expert experience operating orchestration systems such as Kubernetes at scale.
Strong scripting skills using languages like Python or Bash, and experience with Git and GitHub workflows.
Experience setting up and using monitoring tools such as Prometheus, Grafana, or similar for comprehensive tracing and monitoring.
Proven track record of building and operating scalable, reliable, and secure systems.
A natural knack for troubleshooting complex systems and solving challenging technical problems.
Proactive attitude in identifying problems, performance bottlenecks, and areas for improvement.
Comfortable working with ambiguity and rapid change in a dynamic environment.

Education

BSCS or equivalent required, MSCS or equivalent strongly preferred

Netskope is committed to implementing equal employment opportunities for all employees and applicants for employment. Netskope does not discriminate in employment opportunities or practices based on religion, race, color, sex, marital or veteran statues, age, national origin, ancestry, physical or mental disability, medical condition, sexual orientation, gender identity/expression, genetic information, pregnancy (including childbirth, lactation and related medical conditions), or any other characteristic protected by the laws or regulations of any jurisdiction in which we operate.

Netskope respects your privacy and is committed to protecting the personal information you share with us, please refer to Netskope's Privacy Policy for more details.

The application window for this position is expected to close within 50 days. You may apply by filling out the below information, or visiting our Netskope Careers site.

Total Views

Apply Clicks

Mock Applicants

Scraps

Similar Jobs

Staff Economist

Brex · New York, New York, United States

Senior Staff TLM, Perception, Semantics Understanding

Waymo · Mountain View, California, United States; San Francisco, California, United States

Cast Member (part-time)

Hulu (Disney) · Dublin, Ireland

Senior Staff TLM, Perception, Sensor Pipelines

Waymo · Mountain View, CA, USA; San Francisco, CA, USA

Staff TLM, Perception, Semantics Understanding

Waymo · Mountain View, CA, USA; San Francisco, CA, USA

About Netskope

Netskope

Series F+

Held company.

1,001-5,000

Employees

Denver

Headquarters

$7.5B

Valuation

Reviews

3.6

1 reviews

Work Life Balance

2.0

Compensation

3.0

Culture

2.5

Career

3.0

Management

2.5

20%

Recommend to a Friend

Cons

Poor work-life balance expectations

Required to work beyond standard 40 hours

Expected to work until 8pm local time

Salary Ranges

35 data points

Junior/L3

Junior/L3 · ANALYTICS SPECIALIST I

1 reports

$59,824

total / year

Base

$52,021

Stock

Bonus

$59,824

Interview Experience

47 interviews

Difficulty

3.2

/ 5

Duration

14-28 weeks

Offer Rate

40%

Experience

Positive 62%

Neutral 26%

Negative 12%

Interview Process

Phone Screen

Technical Interview

Hiring Manager

Team Fit

Common Questions

Technical skills

Past experience

Team collaboration

Problem solving

News & Buzz

Abstract Security partners with Netskope to bring real-time detection into security data streams - SiliconANGLE

Source: SiliconANGLE

News

5w ago

Telefónica Tech unveils Netskope-based SSE for UK, Ireland - IT Brief UK

Source: IT Brief UK

News

5w ago

Has The Recent Slide In Netskope (NTSK) Created A Fair Entry Point For Investors? - simplywall.st

Source: simplywall.st

News

6w ago

Massachusetts Financial Services Co. MA Makes New Investment in Netskope Inc. $NTSK - MarketBeat

Source: MarketBeat

News

6w ago