채용

PCAI And AI Factory Expert

Juniper Networks

Bangalore, Karnataka, India

On-site

Full-time

3w ago

필수 스킬

Kubernetes

PCAI And AI Factory Expert:

This role has been designed as ‘Hybrid’ with an expectation that you will work on average 2 days per week from an HPE office.

Who We Are:

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE.

Job Description:

HPE Operations is our innovative IT services organization. It provides the expertise to advise, integrate, and accelerate our customers’ outcomes from their digital transformation. Our teams collaborate to transform insight into innovation. In today’s fast paced, hybrid IT world, being at business speed means overcoming IT complexity to match the speed of actions to the speed of opportunities. Deploy the right technology to respond quickly to market possibilities. Join us and redefine what’s next for you.

What you’ll do:

We are seeking a Subject Matter Expert (SME) – Admin, Operate & Manage (HPE PCAI & AI Factory Solutions) to manage and optimize HPE’s next-generation AI infrastructure platforms. The ideal candidate will have deep hands-on expertise in AI, HPC, and GPU-accelerated environments, with strong knowledge of HPE Ezmeral, NVIDIA AI Enterprise, Containerized workloads, and Automation frameworks. This role focuses on the operational stability, lifecycle management, and continuous improvement of large-scale Private Cloud for AI (PCAI) and AI Factory deployments.

Key Responsibilities:

Platform Administration •

Administer and maintain HPE PCAI and AI Factory environments, ensuring optimal uptime and performance.

Manage compute nodes (HPE DL380a, DL325, Cray XD670), GPU clusters (NVIDIA L40S/H100/H200), and Infini Band NDR networks.
Administer virtualization and container platforms such as v Sphere, RHEL/RHOS, Ezmeral Runtime Enterprise, Kubernetes, and Rancher Harvester.
Perform configuration, patching, version upgrades, and firmware updates across hardware and software layers.
1. Operational Monitoring & Incident Management • Proactively monitor system health using DCGM, NetQ, Grafana, and Exivity dashboards.
Handle alerts, performance anomalies, and incidents across GPU, network, and storage layers.
Lead root cause analysis (RCA) and corrective action plans to prevent recurring issues.
Maintain operational documentation, runbooks, and incident logs.

3**. Lifecycle & Configuration Management**

Manage cluster lifecycle through Ansible, AWX, HPE Performance Cluster Manager (HPCM), and SLURM.
Oversee automation for provisioning, scaling, and patch management of Compute and Containerized workloads.
Manage configuration changes, infrastructure templates, and version baselines in production and staging environments.

AI Platform & Software Operations

Operate HPE Ezmeral Unified Analytics, Data Fabric, and AI Essentials platforms.
Support NVIDIA AI Enterprise (NVAIE) components including NIMs, NeMO frameworks, and RAPIDS runtime.
Manage and monitor AI/ML workloads (LLM, NLP, Computer Vision, Chatbots) on containerized clusters.
Ensure smooth operation of development tools like Jupyter, Spark, Airflow, MLflow, Kubeflow, and Ray.

5.Storage & Data Operations

Administer VAST, WEKA, and Alletra MP storage solutions for file, object, and distributed storage.
Monitor storage performance, replication, and capacity utilization.
Coordinate with storage engineering teams for performance optimization and capacity planning.

Security, IAM & Compliance

Implement and maintain Keycloak for authentication and role-based access control.
Ensure adherence to compliance, audit, and governance standards for AI workloads.
Support user and service account provisioning, credential management, and access reviews.

7.Continuous Improvement & Knowledge Enablement • Optimize automation workflows to reduce manual intervention and improve service response time.

Drive service health reviews, operational dashboards, and SLA compliance reporting.
Conduct enablement sessions for L1/L2 teams and act as the final escalation point for operational issues.
Collaborate with HPE Engineering for patch validation, release readiness, and operational feedback. Required Skills & Technical Expertise: Core Infrastructure Skills
Administration of HPE DL380a, DL325, Cray XD670, and GPU-based Compute environments.
Strong knowledge of NVIDIA GPU stack, Infini Band NDR, and Spectrum-X switches.
Experience in managing VAST, WEKA, or Alletra MP storage systems.

Software & Platform Operations:

Virtualization: v Sphere, RHEL, Ezmeral Runtime Enterprise
Containers: Kubernetes, Rancher Harvester, Kube Sphere, Morpheus • Automation: Ansible, AWX, Net Box, HPCM, SLURM
Observability: Grafana, NetQ, Exivity, DCGM
Security: Keycloak, IAM integrations AI/ML Platform Administration
Experience in HPE Ezmeral Unified Analytics and Data Fabric operations
Familiarity with NVIDIA AI Enterprise, NIMs, NeMO, and Triton Inference Server • Working knowledge of Tensor Flow, Py Torch, Spark, Kubeflow, MLflow, and Jupyter Preferred Certifications

HPE ASE / Master ASE (Compute, Storage, or Ezmeral)
NVIDIA Certified Professional / NVAIE Certification
RHCE / Kubernetes Administrator (CKA) / VMware VCP Soft Skills:
Strong analytical and troubleshooting capabilities.
Excellent communication and collaboration skills across global teams.
Ability to lead operations improvement initiatives and mentor support engineers.
Focused on reliability, scalability, and service excellence.

For Internal Job Movement:

Approval of the employee's current manager is required.
Employees are expected to notify their manager prior to an interview.
Employees in Performance Improvement Plan are not eligible to apply.
Minimum level should be EXP if applying as part of Internal Job Posting.

Why Join Us:

Work on next-generation AI infrastructure operations and automation

Be part of a global team managing HPE’s AI Factory and PCAI platforms supporting large-scale AI workloads.
Opportunity to contribute to service innovation and continuous improvement initiatives in AI infrastructure management

What you need to bring:

Bachelor’s / Master’s Degree in Computer Science, IT, or equivalent field.

8+ years of IT infrastructure administration experience, including 3+ years in AI/HPC or GPUbased environments.
Proven experience in platform operations, monitoring, and lifecycle management of enterprise-grade AI and HPC environments.
Hands-on experience in automation and orchestration across bare metal and containerized infrastructure.

Additional Skills:

Accountability, Accountability, Action Planning, Active Learning, Active Listening, Bias, Business Growth, Business Planning, Coaching, Commercial Acumen, Creativity, Critical Thinking, Cross-Functional Teamwork, Customer Experience Strategy, Customer Solutions, Data Analysis Management, Data Collection Management (Inactive), Data Controls, Design Thinking, Empathy, Follow-Through, Growth Mindset, Intellectual Curiosity (Inactive), Long Term Planning, Managing Ambiguity {+ 5 more}

What We Can Offer You:

Health & Wellbeing

We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.

Personal & Professional Development

We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division.

Unconditional Inclusion

We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.

Let's Stay Connected:

Follow @HPECareers on Instagram to see the latest on people, culture and tech at HPE.

#india

#operations

Job:

Services

Job Level:

Expert

HPE is an Equal Employment Opportunity/ Veterans/Disabled/LGBT employer. We do not discriminate on the basis of race, gender, or any other protected category, and all decisions we make are made on the basis of qualifications, merit, and business need. Our goal is to be one global team that is representative of our customers, in an inclusive environment where we can continue to innovate and grow together. Please click here: Equal Employment Opportunity.

Hewlett Packard Enterprise is EEO Protected Veteran/ Individual with Disabilities.

HPE will comply with all applicable laws related to employer use of arrest and conviction records, including laws requiring employers to consider for employment qualified applicants with criminal histories.

No Fees Notice & Recruitment Fraud Disclaimer

It has come to HPE’s attention that there has been an increase in recruitment fraud whereby scammer impersonate HPE or HPE-authorized recruiting agencies and offer fake employment opportunities to candidates. These scammers often seek to obtain personal information or money from candidates.

Please note that Hewlett Packard Enterprise (HPE), its direct and indirect subsidiaries and affiliated companies, and its authorized recruitment agencies/vendors **will never charge any candidate a registration fee, hiring fee, or any other fee in connection with its recruitment and hiring process.**The credentials of any hiring agency that claims to be working with HPE for recruitment of talent should be verified by candidates and candidates shall be solely responsible to conduct such verification. Any candidate/individual who relies on the erroneous representations made by fraudulent employment agencies does so at their own risk, and HPE disclaims liability for any damages or claims that may result from any such communication.

총 조회수

총 지원 클릭 수

모의 지원자 수

비슷한 채용공고

Advisor, Machine Learning

Dell · Bangalore, India

Applied AI Engineer

Celonis · Bangalore, India

Applied AI Solution Engineer

Celonis · Bangalore, India

AI Search Specialist

Thermo Fisher · Bangalore, India

Machine Learning Engineer 4

Adobe · Bangalore

Juniper Networks 소개

Juniper Networks

Public

Juniper Networks, Inc., was an American multinational corporation headquartered in Sunnyvale, California. The company developed and marketed networking products, including routers, switches, network management software, network security products, and software-defined networking technology.

10,001+

직원 수

Sunnyvale

본사 위치

$7.5B

기업 가치

리뷰

4.1

10개 리뷰

워라밸

3.8

보상

4.2

문화

4.3

커리어

3.5

경영진

4.0

78%

친구에게 추천

장점

Flexible work schedules and remote options

Supportive and approachable management

Collaborative environment and team spirit

단점

Fast-paced environment and overwhelming workload

Communication issues between teams

Limited career advancement opportunities

연봉 정보

46개 데이터

Junior/L3

Junior/L3 · Data Scientist 1

0개 리포트

$100,000

총 연봉

기본급

주식

보너스

$85,000

$115,000

면접 경험

5개 면접

난이도

3.0

/ 5

소요 기간

14-28주

면접 과정

Application Review

Recruiter Screen

Technical Phone Screen

Onsite/Virtual Interviews

Offer

자주 나오는 질문

Coding/Algorithm

Technical Knowledge

Behavioral/STAR

Past Experience

뉴스 & 버즈

Juniper Networks Inc stock (US48203R1041): Why Google Discover changes matter more now - AD HOC NEWS

AD HOC NEWS

News

1d ago

Hewlett Packard Enterprise Strengthens AI Infrastructure Positioning - Let's Data Science

Let's Data Science

News

5d ago

Juniper Networks Patches Dozens of Junos OS Vulnerabilities - SecurityWeek

SecurityWeek

News

1w ago

HPE CEO squares up to Cisco and Huawei as Juniper deal pays off - Light Reading

Light Reading

News

5w ago