热门公司

Oracle
Oracle

Cloud applications and platform services.

Principal Site Reliability Engineer

职能DevOps
级别Staff+
地点Mexico, United States
方式现场办公
类型全职
发布1个月前
立即申请

As a Principal member of the Site Reliability Engineering (SRE) team, you'll take ownership of highly available systems, influence service design, and work across teams to drive resiliency, automation, and operational excellence. This is a hands-on engineering role where deep infrastructure knowledge meets software engineering expertise, ideal for experienced SREs ready to take the lead.

This is not a fully remote role but a hybrid role. Does require in office at least 3 days a week in Guadalajara.

  • Career Level
  • IC4

What You’ll Do:

  • Lead the design, automation, and support of OCI services with a focus on resiliency, security, scalability, and performance.
  • Own and improve the end-to-end reliability metrics (SLOs, SLAs, KPIs) for your services.
  • Design and implement high-availability architectures and standards for large-scale distributed systems.
  • Serve as the ultimate escalation point for complex operational issues, using a deep understanding of service topologies and interdependencies.
  • Architect and build automation and orchestration tools that reduce manual work and prevent problem recurrence.
  • Collaborate with development teams to improve service designs, optimize deployments, and implement best practices for operational efficiency.
  • Guide technical decision-making and mentor junior SREs and developers across teams.
  • Participate in and lead postmortems, root cause analysis, and preventative design changes.
  • Contribute to capacity planning, demand forecasting, and long-term service scalability strategies.
  • Participate in a rotational on-call schedule to ensure the health and availability of production services.

What We’re Looking For:

  • Advanced experience with Linux systems administration
  • This is not a fully remote role but a hybrid role. Does require in office at least 3 days a week in Guadalajara.
  • Strong programming skills in Python (with automation libraries)
  • Advanced Bash/Shell scripting
  • Deep understanding of distributed systems, networking, and service architecture
  • Solid knowledge of databases and how they behave in production (SQL or NoSQL)
  • Strong understanding of CI/CD pipelines, Agile methodologies, and DevOps best practices
  • Experience writing and maintaining unit tests and production-grade software
  • Proven ability to lead cross-functional efforts and technical problem-solving in live environments

Nice to Have:

  • Hands-on experience with monitoring and observability tools (Grafana, Prometheus, New Relic, etc.)
  • Familiarity with Oracle Cloud Infrastructure (OCI) or other cloud platforms (AWS, Azure, GCP)
  • Experience with Infrastructure-as-Code (Terraform, Ansible) and container orchestration (Kubernetes)

浏览量

0

申请点击

0

Mock Apply

0

收藏

0

关于Oracle

Oracle

Oracle

Public

Cloud applications and platform services.

140,000+

员工数

Austin

总部位置

$300B

企业估值

评价

10条评价

3.5

10条评价

工作生活平衡

2.8

薪酬

4.0

企业文化

3.2

职业发展

2.5

管理层

2.3

62%

推荐率

优点

Good compensation and benefits

Supportive team culture and colleagues

Flexible work arrangements

缺点

Poor management and leadership

Work-life balance challenges

Limited career advancement opportunities

薪资范围

31,728个数据点

Principal/L7

Principal/L7 · Senior Principal Consultant

1,776份报告

$205,852

年薪总额

基本工资

$181,648

股票

-

奖金

$24,204

$157,007

$275,085

面试评价

8条评价

难度

3.1

/ 5

时长

14-28周

体验

正面 0%

中性 75%

负面 25%

面试流程

1

Application Review

2

Recruiter Screen

3

Technical Phone Screen

4

Final Interview

5

Offer Decision

常见问题

Coding/Algorithm

Technical Knowledge

Behavioral/STAR

Past Experience