热门公司

JPMorgan Chase
JPMorgan Chase

Global financial services firm

Senior Lead Site Reliability Engineer

职能DevOps
级别Lead级
地点Plano, TX, United States
方式现场办公
类型全职
发布2个月前
立即申请

必备技能

Python

Java

Terraform

Guide and shape the future of technology at a globally recognized firm, driven by pride in ownership.

As a Senior Lead Site Reliability Engineering at JPMorgan Chase within the Infrastructure & Production Management sector of Consumer & Community Banking, you are the non-functional requirement owner and champion for the applications in your remit. You are a key influencer in your team’s strategic planning, driving continual improvement in customer experience, resiliency, security, scalability, monitoring, instrumentation, and automation of the software in your area. You act in a blameless, data-driven manner and navigate difficult situations with composure and tact.

Job responsibilities

  • Demonstrates expertise in site reliability principles and demonstrates an understanding of the fine balance between features, efficiency, and stability
  • Effectively negotiates with peers and executive partners to ensure optimal outcomes for all
  • Drives the adoption of site reliability practices throughout the organization
  • Ensures your teams demonstrate site reliability best practices with the ability to demonstrate this empirically through stability and reliability metrics
  • Drives a culture of continual improvement and solicits real-time feedback to improve the customer’s experience
  • Ensures your team collaborates with other teams within your group’s specialization and avoids duplication of work where possible
  • Follows blameless, data-driven, post-mortem strategies and conducts regular team debriefs to enable learning from both successes and mistakes
  • Provides personalized coaching for entry to mid-level team members
  • Ensures your team documents and shares their knowledge and innovations via internal forums, communities of practice, guilds, and conferences

Required qualifications, capabilities, and skills

  • Formal training or certification in software engineering concepts and 5+ years of applied experience; plus 2+ years leading technologists to manage and solve complex technical items within your domain.
  • Advanced proficiency in SRE culture and principles, with a track record of implementing SRE practices across application and platform teams while avoiding common pitfalls.
  • Strong observability fundamentals: define and measure SLIs, set and manage SLOs and error budgets, build actionable alerting and dashboards; hands-on experience with Dynatrace and Splunk.
  • Proven resiliency engineering: capacity planning, failure mode analysis, fault-tolerant design (circuit breakers, retries, bulkheads), disaster recovery strategies, and running game days.
  • Proficiency in at least one programming language (e.g., Python, Java Spring Boot, .NET) to build production-grade automation and tooling; deeper coding skills are a plus but not a hard requirement.
  • Proficiency in CI/CD and Infrastructure as Code (e.g., Jenkins, GitLab, Terraform), including pipeline design, environment promotion, and secrets/artifact management.
  • Experience with containers and orchestration (e.g., Docker, Kubernetes, ECS), including image hardening, Helm, and operational runbooks.
  • Ability to troubleshoot common networking technologies and issues (TCP/IP, DNS, HTTP, proxies, load balancers, TLS, routing, VPCs/subnets, firewalls).
  • Demonstrated proficiency operating cloud-scale, distributed systems within a technical discipline (e.g., cloud platforms), with experience at firmwide or similarly large scale.
  • Ability to influence team culture by championing innovation and change; experience mentoring and leading technologists (including hiring, developing, and recognizing talent) as an individual contributor.
  • Automation mindset focused on reducing toil (target ~25% of time), building self-service capabilities, and codifying operational procedures into code.

Preferred qualifications, capabilities, and skills

  • Experience in banking/financial services and familiarity with risk and control expectations in regulated environments.
  • AWS experience; AWS Certified Solutions Architect (Associate or Professional) preferred.
  • Advanced observability ecosystem knowledge beyond Dynatrace/Splunk (e.g., Open Telemetry, Prometheus, Grafana, ELK).
  • Experience scaling SRE practices across multiple teams/platforms, including playbooks, SRE onboarding, and maturity assessments.
  • Exposure to payments concepts and platforms (e.g., ISO 20022, SWIFT, real-time payments) with willingness to learn; not required for the role.
  • Experience with chaos engineering tools (e.g., Gremlin, Litmus, Chaos Mesh) and integrating resilience tests into CI/CD pipelines.
  • Proven cloud cost/performance optimization in production (autoscaling, caching, capacity management, and efficiency tuning

浏览量

0

申请点击

0

Mock Apply

0

收藏

0

关于JPMorgan Chase

JPMorgan Chase

JPMorgan Chase & Co. is an American multinational banking institution headquartered in New York City and incorporated in Delaware. It is the largest bank in the United States, and the world's largest bank by market capitalization as of 2025.

300,000+

员工数

New York City

总部位置

$500B

企业估值

评价

10条评价

3.8

10条评价

工作生活平衡

3.5

薪酬

4.0

企业文化

3.8

职业发展

3.2

管理层

2.8

68%

推荐率

优点

Good benefits and compensation

Supportive colleagues and environment

Flexible work arrangements

缺点

Long hours and heavy workload

Management issues and lack of direction

High stress and expectations

薪资范围

44个数据点

Junior/L3

Mid/L4

Senior/L5

Junior/L3 · Analytics Solutions Associate

1份报告

$139,000

年薪总额

基本工资

$107,000

股票

-

奖金

-

$139,000

$139,000

面试评价

4条评价

难度

3.0

/ 5

时长

14-28周

录用率

50%

体验

正面 25%

中性 75%

负面 0%

面试流程

1

Application Review

2

HR Screen

3

Hiring Manager Interview

4

In-person/Final Interview

5

Offer

常见问题

Behavioral/STAR

Past Experience

Culture Fit

Financial Knowledge

Case Study