채용
필수 스킬
Java
SQL
AWS
PostgreSQL
MongoDB
GCP
Azure
Job Overview We are seeking an experienced Sr.
Principal Engineer, Site Reliability (SRE) to drive technical excellence within our global Site Reliability Engineering organization.
This role is essential to maintaining and improving the reliability, scalability, and performance of our multi-cloud SaaS platform serving thousands of customers worldwide.
The successful candidate will provide hands-on technical expertise and strategic technical direction in incident response, system optimization, and reliability engineering practices across our complex technology stack.
Off hours support as needed About Us When you join iCIMS, you join the team helping global companies transform business and the world through the power of talent.
Our customers do amazing things:
design rocket ships, create vaccines, deliver consumer goods globally, overnight, with a smile.
As the Talent Cloud company, we empower these organizations to attract, engage, hire, and advance the right talent.
We’re passionate about helping companies build a diverse, winning workforce and about building our home team.
We're dedicated to fostering an inclusive, purpose-driven, and innovative work environment where everyone belongs.
Responsibilities Technical Leadership Provide strategic technical direction for a team of 5+ SRE engineers across one or more geographic regions (US, Ireland, or India) Provide technical mentorship and guidance for team members Drive technical decision-making for complex reliability and performance challenges Conduct architecture reviews and drive system design decisions for reliability Lead post-incident reviews and drive implementation of preventive measures Incident Management & Response Participate in enterprise-wide incident management, ensuring rapid prevention, detection, response, and resolution Develop and maintain runbooks and emergency response procedures Lead root cause analysis and ensure comprehensive documentation Participate in 24/7 on-call rotation and escalation procedures across global teams Interface with Engineering teams and Incident Manager during critical incident resolution Platform Reliability & Performance Monitor and optimize multi-cloud infrastructure (AWS primary, Azure, GCP) Ensure reliability of core services: AWS resources, Auth0/Okta authentication, databases (SQL Server, PostgreSQL, MongoDB), and legacy Java applications Implement and maintain SLIs, SLOs, and error budgets for assigned services Drive capacity planning and performance optimization initiatives Automation & Tooling Design automation solutions to reduce manual operational overhead Develop monitoring strategies using New Relic, Grafana, and Sumo Logic Create infrastructure-as-code for reliable deployments Build self-healing systems and automated remediation workflows Qualifications Technical
Experience: 8+ years in SRE, DevOps, or Infrastructure Engineering roles with 4+ years in senior positions Deep hands-on experience with multi-cloud environments (AWS required, Azure preferred) Strong Linux system administration and troubleshooting
Experience: with containerization (Docker) and orchestration (Kubernetes, ECS) Proficiency with monitoring tools (New Relic, Grafana, Prometheus) Leadership & Communication Proven track record mentoring technical teams and driving technical direction
Experience: serving as senior technical leader during critical incidents Strong communication skills with engineering teams and stakeholders Cross-functional collaboration in agile environments SRE & Operations Demonstrated success implementing SRE principles in large-scale production environments
Experience: with ITIL frameworks and tools Background in establishing and maintaining SLAs for enterprise SaaS products Preferred Authentication and identity management systems knowledge Infrastructure-as-code tools (Terraform, CloudFormation) EEO Statement iCIMS is a place where everyone belongs.
We celebrate diversity and are committed to creating an inclusive environment for all employees.
Our approach helps us to build a winning team that represents a variety of backgrounds, perspectives, and abilities.
So, regardless of how your diversity expresses itself, you can find a home here at iCIMS.
We prohibit discrimination and harassment of any kind based on race, color, religion, national origin, sex (including pregnancy), sexual orientation, gender identity, gender expression, age, veteran status, genetic information, disability, or other applicable legally protected characteristics.
If you’d like to request an accommodation due to a disability, please contact us at careers@icims.com.
Compensation and Benefits Competitive health and wellness benefits include medical insurance (employee and dependent family members), personal accident and group term life insurance, bonding and parental leave, lifestyle spending account reimbursements, wellness services offerings, sick and casual/emergency days, paid holidays, tuition reimbursement, retirals (PF - employer contribution) and gratuity.
Benefits and eligibility may vary by location, role, and tenure.
Learn more here:
https://careers.icims.com/benefits
Technical Leadership Provide strategic technical direction for a team of 5+ SRE engineers across one or more geographic regions (US, Ireland, or India) Provide technical mentorship and guidance for team members Drive technical decision-making for complex reliability and performance challenges Conduct architecture reviews and drive system design decisions for reliability Lead post-incident reviews and drive implementation of preventive measures Incident Management & Response Participate in enterprise-wide incident management, ensuring rapid prevention, detection, response, and resolution Develop and maintain runbooks and emergency response procedures Lead root cause analysis and ensure comprehensive documentation Participate in 24/7 on-call rotation and escalation procedures across global teams Interface with Engineering teams and Incident Manager during critical incident resolution Platform Reliability & Performance Monitor and optimize multi-cloud infrastructure (AWS primary, Azure, GCP) Ensure reliability of core services: AWS resources, Auth0/Okta authentication, databases (SQL Server, PostgreSQL, MongoDB), and legacy Java applications Implement and maintain SLIs, SLOs, and error budgets for assigned services Drive capacity planning and performance optimization initiatives Automation & Tooling Design automation solutions to reduce manual operational overhead Develop monitoring strategies using New Relic, Grafana, and Sumo Logic Create infrastructure-as-code for reliable deployments Build self-healing systems and automated remediation workflows
Technical
Experience: 8+ years in SRE, DevOps, or Infrastructure Engineering roles with 4+ years in senior positions Deep hands-on experience with multi-cloud environments (AWS required, Azure preferred) Strong Linux system administration and troubleshooting
Experience: with containerization (Docker) and orchestration (Kubernetes, ECS) Proficiency with monitoring tools (New Relic, Grafana, Prometheus) Leadership & Communication Proven track record mentoring technical teams and driving technical direction
Experience: serving as senior technical leader during critical incidents Strong communication skills with engineering teams and stakeholders Cross-functional collaboration in agile environments SRE & Operations Demonstrated success implementing SRE principles in large-scale production environments
Experience: with ITIL frameworks and tools Background in establishing and maintaining SLAs for enterprise SaaS products
총 조회수
2
총 지원 클릭 수
0
모의 지원자 수
0
스크랩
0
비슷한 채용공고

Senior Branch Premier Banker Flanders
Wells Fargo · FLANDERS, NJ

Senior Immunology Sales Specialist, Dermatology (Wichita, KS) - Johnson & Johnson Innovative Medicine
Johnson & Johnson · Wichita, Kansas, United States

Senior Business Banker - NYC Market
Capital One · New York, NY

Senior Manager of Sales Strategy (Remote)
Axiom · Houston, Texas, United States

Senior Sales Manager - Hilton Motif Seattle
Hilton · Seattle, Washington, United States
iCIMS 소개

iCIMS
Series F+iCIMS, Inc. is a New Jersey-based cloud-based human resources and recruiting software company. The company name is an acronym for Internet Collaborative Information Management Systems.
501-1,000
직원 수
Holmdel
본사 위치
리뷰
4.1
31개 리뷰
워라밸
3.7
보상
4.5
문화
4.4
커리어
4.1
경영진
3.8
84%
친구에게 추천
장점
Strong engineering culture with focus on code quality
Competitive compensation packages with equity
Flexible remote work options and good work-life balance
단점
Work-life balance can be challenging during product launches
Fast-paced environment with tight deadlines
Organizational changes and restructuring can be disruptive
연봉 정보
0개 데이터
Junior/L3
Intern
Junior/L3 · Technical Account Manager
0개 리포트
$68,340
총 연봉
기본급
-
주식
-
보너스
-
$58,089
$78,591
면접 경험
7개 면접
난이도
3.1
/ 5
소요 기간
14-28주
합격률
43%
경험
긍정 43%
보통 28%
부정 29%
면접 과정
1
Application
2
Screening Call
3
Interview
4
Assessment
자주 나오는 질문
Case Study
Technical Assessment
ATS Screening
뉴스 & 버즈
Healthcare Provider Triples Talent Pipeline and Boosts Applicants Per Opening 17% with ICIMS Candidate Experience Management - PR Newswire
PR Newswire
News
·
2w ago
ICIMS Launches Purpose-Built Hiring Solution for Recruiting Frontline Workers - Supply & Demand Chain Executive
Supply & Demand Chain Executive
News
·
4w ago
Breaking News: What ICIMS March Workforce Report reveals that BLS doesn’t | ep117 - collegerecruiter.com
collegerecruiter.com
News
·
4w ago
ICIMS’ spring release focuses on frontline hiring ‘experience layer’ - AIM Group
AIM Group
News
·
4w ago