
AI Operations Engineer
About the role
Who You'll Work With:
We are the Cloud Operations team within Cisco IT, driving the development and management of Infrastructure capabilities that support Cisco’s Engineering and business functions worldwide. Our mission is to build scalable, efficient, and cutting-edge infrastructure powering the next generation of AI solutions. By using automation, advanced hardware, and AI-optimized frameworks, we ensure seamless integration, reliable performance, and future-ready services through continuous innovation and emerging technologies. The team culture is dynamic and collaborative, where creative problem-solving, exploring new ideas, and pushing boundaries are celebrated.
Who You Are:
You are an innovative and skilled AI Engineer to join our Cloud Operations team. This role involves applying artificial intelligence and machine learning techniques to optimize cloud infrastructure, automate routine operations, enhance performance monitoring, and improve system resilience. The ideal candidate has experience in cloud platforms (AWS, Azure, or GCP, Open Stack and VMWare), DevOps practices, and AI/ML development. An excellent collaborator who can partner, lead, guide, and communicate advanced technical concepts. A hardworking and passionate engineer comfortable working in high-pressure, large-scale enterprise environments.
What will you do
- Design and implement AI Agents to optimize cloud resource allocation, auto-scaling, and performance tuning.
- Develop predictive models for failure detection, incident management, and system health monitoring.
- Automate operational workflows using machine learning and intelligent scripting.
- Integrate AI-driven insights with existing cloud monitoring tools.
- Collaborate with DevOps and SRE teams to deploy, monitor, and improve ML models in production environments.
- Conduct anomaly detection for security, cost optimization, and performance analytics.
- Continuously evaluate emerging AI technologies and tools for operational improvements.
- Maintain documentation and best practices for AI/ML integration in cloud systems.
Our Minimum Requirements include:
- Bachelor's with minimum 4 years of experience or equivalent experience or Master’s degree in Computer Science, Data Science, or related technical field with 2 years experience.
- Proven ability building and deploying ML models and Agentic AI platforms solutions, MCP server development, A2A protocol with at least 2 years focused on infrastructure or cloud operations.
- Experience with Python, Jupyter, and ML libraries such as Py Torch, Tensor Flow, or scikit-learn or similar technology in Go.
- Familiarity with cloud-native monitoring, logging, and automation tools (e.g., Terraform, Ansible, Prometheus, Splunk, App Dynamics) and basic knowledge of cloud and virtualisation
- Experience with Agile and DevOps operating models, including project tracking tools (e.g., Jira), Git (any Version Control systems), and CI/CD systems (e.g., GitLab, GitHub Actions, Jenkins)
Preferred Qualifications:
- Develop and deploy Agentic AI solutions and ML models
- Experience deploying models on cloud platforms (AWS, Azure, or Google Cloud) and familiarity with Docker, Kubernetes, and CI/CD tools to build robust AI pipelines.
- Established record of leading technical initiatives, delivering results, and a commitment to fostering a supportive work environment.
Why Cisco?
At Cisco, we’re revolutionizing how data and infrastructure connect and protect organizations in the AI era – and beyond. We’ve been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.
Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you’ll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.
We are Cisco, and our power starts with you.
Benefits and perks
•Learning Budget
Required skills
AI/ML
Cloud operations
Automation
Monitoring
Infrastructure management
About Cisco
Heredia
Headquarters