
Application delivery and security
Senior Site Reliability Engineer, AI Inference at F5 Networks
About the role
At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation.
Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive.
Job Title: AI Inference Engineer Role Objective
The AI Inference Engineer plays a critical role in the AI lifecycle by bridging the gap between high-performance model development and optimized deployment environments. This position focuses on optimizing Large Language Models (LLMs) for inference, serving diverse environments—from GPU-rich data centers to resource-constrained edge devices—with a strong emphasis on maximizing throughput, minimizing latency, and maintaining model accuracy.
This role is pivotal in advancing F5’s AI capabilities, ensuring enterprise-grade reliability by leveraging hardware acceleration, designing scalable infrastructure, and monitoring system performance.
Key Responsibilities High-Performance AI Serving
-
Build and maintain robust inference engines using tools like vLLM,TGI (Text Generation Inference), and NVIDIA Triton, ensuring high performance at scale.
-
Handle deployment optimizations to deliver low-latency AI serving solutions for multiple business applications.
Hardware Acceleration and Optimization
-
Profile and optimize models for specialized hardware backends, including NVIDIA GPUs(CUDA/TensorRT),Apple Silicon (CoreML), and AI accelerators like TPUs and LPUs.
-
Collaborate with hardware teams to maximize utilization and performance across various computational environments.
Inference Orchestration and Scalability
-
Design and implement auto-scaling architectures for online (real-time) and batch inference pipelines, leveraging Kubernetes for inference routing and orchestration.
-
Ensure software solutions are optimized for peak performance during traffic spikes, maintaining reliability and scalability.
Performance Monitoring and Observability
-
Establish robust observability frameworks to monitor Time to First Token (TTFT), tokens per second, and memory bandwidth utilization against service-level agreements (SLAs).
-
Build and execute performance and load testing suites to identify bottlenecks and ensure consistent reliability at scale.
Technical Requirements Required Skills:
- Programming Languages:
Proficiency in programming languages such as Python,C++,Rust, or Golang specifically for high-performance AI workflows.
- Inference Tools:
Proven hands-on experience with tools like vLLM,TensorRT,Llama.cpp, and Ollama for inference development and optimization.
- Infrastructure Expertise:
Strong familiarity with infrastructure technologies, including Docker,Kubernetes, and cloud platforms such as AWS,GCP, and Azure.
- Hardware Optimization Expertise:
Comprehensive understanding of GPU and AI hardware, including techniques for profiling and optimizing performance for accelerators like NVIDIA GPUs and TPUs.
Preferred Experience:
-
Prior experience deploying **Large Language Models (LLMs)**with advanced techniques like Speculative Decoding or Paged Attention.
-
Contributions to open-source inference libraries or hardware-level kernel development (e.g., CUDA, Triton kernels).
-
Background in MLOps or SRE roles focused on high-performance AI endpoints and reliability during demand surges.
-
Proficiency in designing scalable solutions for high-throughput inference environments optimized for traffic bursts.
Success Metrics (KPIs):
- Latency Reduction:
Continuously improve inference latency metrics, ensuring minimal Time to First Token (TTFT) and maximum tokens per second.
- Cost Efficiency:
Achieve lower "Cost per 1K Tokens" through better resource utilization and hardware optimization.
- Scalability:
Maintain system stability and reliability during traffic spikes, ensuring performance consistency across environments.
- Throughput Maximization:
Deploy models optimized for peak hardware usage and maximized process throughput.
Why Join F5?
F5 empowers you to push boundaries in AI optimization and high-performance engineering. Joining our team means:
-
Collaborating with cutting-edge technologies and hardware solutions to support real-time AI applications.
-
Advancing your career in a fast-paced, multidisciplinary environment focused on innovation, scalability, and problem-solving.
-
Driving transformative projects that deliver real-time AI reliability to global customers while maintaining cost and efficiency standards.
-
Working on advanced MLOps solutions that seamlessly scale enterprise AI systems and shape the future of intelligent deployment.
What Success Looks Like:
As an AI Inference Engineer at F5, success is measured by your ability to:
-
Combine technical expertise and problem-solving skills to deliver low-latency, scalable, and high-performing AI prediction systems.
-
Collaborate efficiently across cross-functional teams, participating in knowledge sharing and system refinement.
-
Demonstrate initiative by driving optimizations across hardware, tools, and orchestration processes, balancing immediate solutions with long-term architectural goals.
-
Translate complex AI and inference workflows into practical solutions that align with F5's strategic objectives.
The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.
Please note that F5 only contacts candidates through F5 email address (ending with @f5.com) or auto email notification from Workday (ending with f5.com or@myworkday.com).Equal Employment Opportunity
It is the policy of F5 to provide equal employment opportunities to all employees and employment applicants without regard to unlawful considerations of race, religion, color, national origin, sex, sexual orientation, gender identity or expression, age, sensory, physical, or mental disability, marital status, veteran or military status, genetic information, or any other classification protected by applicable local, state, or federal laws. This policy applies to all aspects of employment, including, but not limited to, hiring, job assignment, compensation, promotion, benefits, training, discipline, and termination. F5 offers a variety of reasonable accommodations for candidates. Requesting an accommodation is completely voluntary. F5 will assess the need for accommodations in the application process separately from those that may be needed to perform the job. Request by contacting accommodations@f5.com.
Required skills
SRE
LLM inference
Performance tuning
Scalable infrastructure
Monitoring
Reliability engineering
Total Views
0
Total Apply Clicks
0
Total Mock Apply
0
Total Bookmarks
0
More open roles at F5 Networks

Sr. Software Developer - Oracle CPQ
F5 Networks · Mexico Homebase

Channel Account Manager
F5 Networks · UK Homebase

Sr. Customer Success Manager
F5 Networks · Seattle; Field-AZ; Field-AR; Field-CA; Field-CO; Field-CT; Field-DE; Field-DC; Field-GA; Field-IL; Field-LA; Field-ME; Field-MD; Field-MA; Field-MI; Field-MN; Field-MO; Field-MT; Field-NV; Field-NJ; Field-NM; Field-NY; Field-OR; Field-PA; Field-RI; Field-TN; Field-TX; Field-UT; Field-VT; Field-VA; Field-WA; Field-WV; Field-WY

Site Reliability Engineer III
F5 Networks · Hyderabad (SEZ)

Account Executive–Defence
F5 Networks · Canberra
Similar jobs

Principal Speech Recognition Researcher (Onsite)
Collins Aerospace (RTX) · US-MD-COLUMBIA-720 ~ 9861 Broken Land Pkwy ~ BBN COLUMBIA, Ste 400

Senior Speech Recognition Researcher (Onsite)
Collins Aerospace (RTX) · US-MD-COLUMBIA-720 ~ 9861 Broken Land Pkwy ~ BBN COLUMBIA, Ste 400

Generative AI Software Developer/Engineer – Aerospace Technologies (Onsite)
RTX (Raytheon) · US-IA-CEDAR RAPIDS-124 ~ 400 Collins Rd NE ~ BLDG 124

AI Engineer
Rockwell Automation · Singapore, Singapore

AI Engineer
Rockwell Automation · Milwaukee; Mayfield Heights
About F5 Networks

F5 Networks
PublicA multi-cloud application services and security company that specializes in application security, performance, and delivery.
5,001-10,000
Employees
Seattle
Headquarters
$2.8B
Valuation
Reviews
10 reviews
3.7
10 reviews
Work-life balance
3.2
Compensation
3.8
Culture
3.5
Career
3.3
Management
3.4
65%
Recommend to a friend
Pros
Great team and collaborative environment
Access to cutting-edge technology and innovation
Good compensation and benefits
Cons
High workload and long hours
Work-life balance challenges
Management issues and leadership concerns
Salary Ranges
31 data points
Junior/L3
Senior/L5
Junior/L3 · BI Developer
1 reports
$127,400
total per year
Base
$98,000
Stock
-
Bonus
-
$127,400
$127,400
Interview experience
1 interviews
Difficulty
4.0
/ 5
Duration
14-28 weeks
Interview process
1
Application Review
2
Recruiter Screen
3
Technical Phone Screen
4
Onsite/Virtual Interviews
5
Final Round Interview
Common questions
Coding/Algorithm
System Design
Behavioral/STAR
Technical Knowledge
Network/Infrastructure Concepts
Latest updates
F5 Networks (FFIV) PT Raised to $356 at Piper Sandler - StreetInsider
StreetInsider
News
·
1w ago
Sectigo Partners with F5 to Deliver Automated Certificate Lifecycle Management Across F5 ADSP - Business Wire
Business Wire
News
·
1w ago
RBC Capital raises F5 Networks stock price target on strong results - Investing.com
Investing.com
News
·
1w ago
Earnings call transcript: F5 Networks beats Q2 2026 forecasts, shares dip - Investing.com
Investing.com
News
·
1w ago