채용
필수 스킬
Kubernetes
PCAI And AI Factory Expert:
This role has been designed as ‘Hybrid’ with an expectation that you will work on average 2 days per week from an HPE office.
Who We Are:
Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE.
Job Description:
HPE Operations is our innovative IT services organization. It provides the expertise to advise, integrate, and accelerate our customers’ outcomes from their digital transformation. Our teams collaborate to transform insight into innovation. In today’s fast paced, hybrid IT world, being at business speed means overcoming IT complexity to match the speed of actions to the speed of opportunities. Deploy the right technology to respond quickly to market possibilities. Join us and redefine what’s next for you.
What you’ll do:
We are seeking a Subject Matter Expert (SME) – Admin, Operate & Manage (HPE PCAI & AI Factory Solutions) to manage and optimize HPE’s next-generation AI infrastructure platforms. The ideal candidate will have deep hands-on expertise in AI, HPC, and GPU-accelerated environments, with strong knowledge of HPE Ezmeral, NVIDIA AI Enterprise, Containerized workloads, and Automation frameworks. This role focuses on the operational stability, lifecycle management, and continuous improvement of large-scale Private Cloud for AI (PCAI) and AI Factory deployments.
Key Responsibilities:
- Platform Administration •
Administer and maintain HPE PCAI and AI Factory environments, ensuring optimal uptime and performance.
-
Manage compute nodes (HPE DL380a, DL325, Cray XD670), GPU clusters (NVIDIA L40S/H100/H200), and Infini Band NDR networks.
-
Administer virtualization and container platforms such as v Sphere, RHEL/RHOS, Ezmeral Runtime Enterprise, Kubernetes, and Rancher Harvester.
-
Perform configuration, patching, version upgrades, and firmware updates across hardware and software layers.
-
- Operational Monitoring & Incident Management • Proactively monitor system health using DCGM, NetQ, Grafana, and Exivity dashboards.
-
Handle alerts, performance anomalies, and incidents across GPU, network, and storage layers.
-
Lead root cause analysis (RCA) and corrective action plans to prevent recurring issues.
-
Maintain operational documentation, runbooks, and incident logs.
3**. Lifecycle & Configuration Management**
-
Manage cluster lifecycle through Ansible, AWX, HPE Performance Cluster Manager (HPCM), and SLURM.
-
Oversee automation for provisioning, scaling, and patch management of Compute and Containerized workloads.
-
Manage configuration changes, infrastructure templates, and version baselines in production and staging environments.
- AI Platform & Software Operations
-
Operate HPE Ezmeral Unified Analytics, Data Fabric, and AI Essentials platforms.
-
Support NVIDIA AI Enterprise (NVAIE) components including NIMs, NeMO frameworks, and RAPIDS runtime.
-
Manage and monitor AI/ML workloads (LLM, NLP, Computer Vision, Chatbots) on containerized clusters.
-
Ensure smooth operation of development tools like Jupyter, Spark, Airflow, MLflow, Kubeflow, and Ray.
5.Storage & Data Operations
-
Administer VAST, WEKA, and Alletra MP storage solutions for file, object, and distributed storage.
-
Monitor storage performance, replication, and capacity utilization.
-
Coordinate with storage engineering teams for performance optimization and capacity planning.
- Security, IAM & Compliance
-
Implement and maintain Keycloak for authentication and role-based access control.
-
Ensure adherence to compliance, audit, and governance standards for AI workloads.
-
Support user and service account provisioning, credential management, and access reviews.
7.Continuous Improvement & Knowledge Enablement • Optimize automation workflows to reduce manual intervention and improve service response time.
-
Drive service health reviews, operational dashboards, and SLA compliance reporting.
-
Conduct enablement sessions for L1/L2 teams and act as the final escalation point for operational issues.
-
Collaborate with HPE Engineering for patch validation, release readiness, and operational feedback. Required Skills & Technical Expertise: Core Infrastructure Skills
-
Administration of HPE DL380a, DL325, Cray XD670, and GPU-based Compute environments.
-
Strong knowledge of NVIDIA GPU stack, Infini Band NDR, and Spectrum-X switches.
-
Experience in managing VAST, WEKA, or Alletra MP storage systems.
Software & Platform Operations:
-
Virtualization: v Sphere, RHEL, Ezmeral Runtime Enterprise
-
Containers: Kubernetes, Rancher Harvester, Kube Sphere, Morpheus • Automation: Ansible, AWX, Net Box, HPCM, SLURM
-
Observability: Grafana, NetQ, Exivity, DCGM
-
Security: Keycloak, IAM integrations AI/ML Platform Administration
-
Experience in HPE Ezmeral Unified Analytics and Data Fabric operations
-
Familiarity with NVIDIA AI Enterprise, NIMs, NeMO, and Triton Inference Server • Working knowledge of Tensor Flow, Py Torch, Spark, Kubeflow, MLflow, and Jupyter Preferred Certifications
:
-
HPE ASE / Master ASE (Compute, Storage, or Ezmeral)
-
NVIDIA Certified Professional / NVAIE Certification
-
RHCE / Kubernetes Administrator (CKA) / VMware VCP Soft Skills:
-
Strong analytical and troubleshooting capabilities.
-
Excellent communication and collaboration skills across global teams.
-
Ability to lead operations improvement initiatives and mentor support engineers.
-
Focused on reliability, scalability, and service excellence.
For Internal Job Movement:
-
Approval of the employee's current manager is required.
-
Employees are expected to notify their manager prior to an interview.
-
Employees in Performance Improvement Plan are not eligible to apply.
-
Minimum level should be EXP if applying as part of Internal Job Posting.
Why Join Us:
- Work on next-generation AI infrastructure operations and automation
.
-
Be part of a global team managing HPE’s AI Factory and PCAI platforms supporting large-scale AI workloads.
-
Opportunity to contribute to service innovation and continuous improvement initiatives in AI infrastructure management
What you need to bring:
Bachelor’s / Master’s Degree in Computer Science, IT, or equivalent field.
-
8+ years of IT infrastructure administration experience, including 3+ years in AI/HPC or GPUbased environments.
-
Proven experience in platform operations, monitoring, and lifecycle management of enterprise-grade AI and HPC environments.
-
Hands-on experience in automation and orchestration across bare metal and containerized infrastructure.
Additional Skills:
Accountability, Accountability, Action Planning, Active Learning, Active Listening, Bias, Business Growth, Business Planning, Coaching, Commercial Acumen, Creativity, Critical Thinking, Cross-Functional Teamwork, Customer Experience Strategy, Customer Solutions, Data Analysis Management, Data Collection Management (Inactive), Data Controls, Design Thinking, Empathy, Follow-Through, Growth Mindset, Intellectual Curiosity (Inactive), Long Term Planning, Managing Ambiguity {+ 5 more}
What We Can Offer You:
Health & Wellbeing
We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.
Personal & Professional Development
We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division.
Unconditional Inclusion
We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.
Let's Stay Connected:
Follow @HPECareers on Instagram to see the latest on people, culture and tech at HPE.
#india
#operations
Job:
Services
Job Level:
Expert
HPE is an Equal Employment Opportunity/ Veterans/Disabled/LGBT employer. We do not discriminate on the basis of race, gender, or any other protected category, and all decisions we make are made on the basis of qualifications, merit, and business need. Our goal is to be one global team that is representative of our customers, in an inclusive environment where we can continue to innovate and grow together. Please click here: Equal Employment Opportunity.
Hewlett Packard Enterprise is EEO Protected Veteran/ Individual with Disabilities.
HPE will comply with all applicable laws related to employer use of arrest and conviction records, including laws requiring employers to consider for employment qualified applicants with criminal histories.
No Fees Notice & Recruitment Fraud Disclaimer
It has come to HPE’s attention that there has been an increase in recruitment fraud whereby scammer impersonate HPE or HPE-authorized recruiting agencies and offer fake employment opportunities to candidates. These scammers often seek to obtain personal information or money from candidates.
Please note that Hewlett Packard Enterprise (HPE), its direct and indirect subsidiaries and affiliated companies, and its authorized recruitment agencies/vendors **will never charge any candidate a registration fee, hiring fee, or any other fee in connection with its recruitment and hiring process.**The credentials of any hiring agency that claims to be working with HPE for recruitment of talent should be verified by candidates and candidates shall be solely responsible to conduct such verification. Any candidate/individual who relies on the erroneous representations made by fraudulent employment agencies does so at their own risk, and HPE disclaims liability for any damages or claims that may result from any such communication.
총 조회수
0
총 지원 클릭 수
0
모의 지원자 수
0
스크랩
0
비슷한 채용공고
Juniper Networks 소개

Juniper Networks
PublicJuniper Networks, Inc., was an American multinational corporation headquartered in Sunnyvale, California. The company developed and marketed networking products, including routers, switches, network management software, network security products, and software-defined networking technology.
10,001+
직원 수
Sunnyvale
본사 위치
$7.5B
기업 가치
리뷰
4.1
10개 리뷰
워라밸
3.8
보상
4.2
문화
4.3
커리어
3.5
경영진
4.0
78%
친구에게 추천
장점
Flexible work schedules and remote options
Supportive and approachable management
Collaborative environment and team spirit
단점
Fast-paced environment and overwhelming workload
Communication issues between teams
Limited career advancement opportunities
연봉 정보
46개 데이터
Junior/L3
Junior/L3 · Data Scientist 1
0개 리포트
$100,000
총 연봉
기본급
-
주식
-
보너스
-
$85,000
$115,000
면접 경험
5개 면접
난이도
3.0
/ 5
소요 기간
14-28주
면접 과정
1
Application Review
2
Recruiter Screen
3
Technical Phone Screen
4
Onsite/Virtual Interviews
5
Offer
자주 나오는 질문
Coding/Algorithm
Technical Knowledge
Behavioral/STAR
Past Experience
뉴스 & 버즈
Juniper Networks Inc stock (US48203R1041): Why Google Discover changes matter more now - AD HOC NEWS
AD HOC NEWS
News
·
1d ago
Hewlett Packard Enterprise Strengthens AI Infrastructure Positioning - Let's Data Science
Let's Data Science
News
·
5d ago
Juniper Networks Patches Dozens of Junos OS Vulnerabilities - SecurityWeek
SecurityWeek
News
·
1w ago
HPE CEO squares up to Cisco and Huawei as Juniper deal pays off - Light Reading
Light Reading
News
·
5w ago



