採用

Lead, Site Reliability Engineering

Mastercard

Dublin, Ireland

On-site

Full-time

2w ago

Our Purpose

Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.

Title and Summary

Lead, Site Reliability Engineering

Site Reliability Engineer (SRE) – Generalist

Role Summary:

The Site Reliability Engineer (SRE) – Generalist is a senior level engineer and cross stack reliability expert who proactively ensures system stability, performance, and operational resilience by deeply understanding application behavior and how it manifests across infrastructure.
This role emphasizes anticipation over reaction. While the SRE Generalist participates in incident response, their primary value is in converting operational signals, incidents, and patterns into preventative actions—improving observability, reducing risk, and eliminating classes of failure before they impact customers. They partner closely with application, platform, and infrastructure teams to continuously reduce mean time to detect (MTTD), mean time to resolve (MTTR), and overall incident frequency through data driven insight, automation, and engineering rigor.

Key Responsibilities:

Proactive Reliability Engineering
Anticipate reliability risks by analyzing application behavior, system signals, and historical incidents to identify failure patterns and systemic weaknesses before they result in outages.
Translate deep application knowledge into reliability requirements, architectural guidance, and infrastructure improvements that prevent incidents rather than simply respond to them.
Continuously assess system health, resiliency gaps, and operational debt, driving improvements that increase service robustness over time.
Incident Response as an Input to Prevention
Participate in and lead troubleshooting efforts during high severity and cross domain incidents, applying structured, data driven investigation techniques.
Use incidents as learning opportunities—performing root cause analysis that focuses on why systems allowed failure, not just what broke.
Ensure incident outcomes result in concrete, measurable improvements such as better instrumentation, safer defaults, automation, or architectural changes.
Observability, Monitoring & Signal Quality
Proactively design and evolve observability strategies by onboarding new data sources and improving signal quality across logs, metrics, traces, and events.
Build dashboards, alerts, and monitors that surface early indicators of degradation, not just failure states.
Apply analytical techniques to detect emerging trends, weak signals, and anomalous behavior before customers are impacted.
Communicate insights through clear data storytelling that enables engineering teams and leaders to act decisively and early.
Automation & Continuous Improvement
Lead automation efforts that reduce manual intervention, shorten feedback loops, and eliminate repetitive operational work.
Convert operational learnings into reusable tools, standards, documentation, and patterns that raise the reliability baseline across teams.
Actively reduce operational toil and risk by improving system defaults, guardrails, and self healing capabilities.
Collaboration, Influence & Mentorship
Partner across application, infrastructure, and platform teams to drive shared ownership of reliability outcomes and proactive operational thinking.
Influence design and delivery decisions by representing the reliability perspective early in the development lifecycle.
Mentor engineers by modeling proactive troubleshooting, systems thinking, and data driven decision making.
Knowledge, Skills & Abilities
Strong ability to reason about systems end to end, connecting application behavior to infrastructure performance and failure modes.
Expertise in observability, monitoring, and troubleshooting tools, with a focus on signal quality and actionable insight.
Proficiency in scripting and automation to operationalize reliability improvements and accelerate learning.
Broad infrastructure knowledge (networking, Linux, databases, containers, storage), with depth in at least one domain.
Strong data analysis and storytelling skills, enabling proactive identification of risks and clear communication of technical insights.
Working knowledge of machine learning concepts and their application to predictive and proactive operational problem solving.
Curiosity, ownership, and a mindset oriented toward preventing tomorrow’s incidents, not just fixing today’s.

What Defines Success in This Role:

A successful SRE Generalist:

Sees incidents as signals, not endpoints.
Uses observability and data to shift reliability work left and upstream.
Reduces incident frequency and impact over time—not just MTTR.
Acts as a connective force across teams, turning complexity into clarity and prevention.

Corporate Security Responsibility

All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:

Abide by Mastercard’s security policies and practices;
Ensure the confidentiality and integrity of the information being accessed;
Report any suspected information security violation or breach, and
Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.

総閲覧数

応募クリック数

模擬応募者数

スクラップ

類似の求人

Operations Engineer - SVP

Citigroup · DUBLIN, Ireland

IT Change Manager (ITIL / ITSM / ServiceNow), Officer

State Street · Kilkenny, Ireland

The Associate Director, Linux/Unix Admin

IQVIA · Dublin, Ireland

ServicesTech Digital Platforms Architecture Lead

Citigroup · DUBLIN, Ireland

Team Lead, Site Reliability Engineering - Storage Layer Service

MongoDB · Dublin

Mastercardについて

Mastercard

Public

A financial network that processes payments between banks and cardholders

10,001+

従業員数

Purchase

本社所在地

$360B

企業価値

レビュー

3.6

10件のレビュー

ワークライフバランス

4.1

報酬

3.4

企業文化

4.0

キャリア

2.3

経営陣

3.2

65%

友人に勧める

良い点

Good benefits and compensation

Collaborative environment and great colleagues

Supportive work-life balance

改善点

Limited career advancement opportunities

Management and leadership issues

Heavy workload and stress

給与レンジ

51件のデータ

Junior/L3

Director

Junior/L3 · Data Engineer

5件のレポート

$137,800

年収総額

基本給

$106,000

ストック

ボーナス

$107,900

$166,918

面接体験

7件の面接

難易度

3.3

/ 5

期間

14-28週間

内定率

29%

体験

ポジティブ 0%

普通 86%

ネガティブ 14%

面接プロセス

Application Review

Recruiter Screen

Technical Interview

Behavioral Interview

Final Round/Super Day

Offer Decision

よくある質問

Coding/Algorithm

Technical Knowledge

Behavioral/STAR

System Design

Past Experience

ニュース＆話題

Whittier Trust Co. of Nevada Inc. Acquires 1,932 Shares of Mastercard Incorporated $MA - MarketBeat

MarketBeat

News

3d ago

Is Mastercard (MA) Quietly Building the Trust Layer for AI Commerce With Verifiable Intent? - simplywall.st

simplywall.st

News

3d ago

CAF and Mastercard Join Forces to Expand Access to Finance for across Latin America and the Caribbean - CAF | Banco

CAF | Banco

News

4d ago

Lobster.cash Teams With Mastercard to Secure Agentic Card Transactions - PYMNTS.com

PYMNTS.com

News

5d ago