
Organizing the world's information and making it universally accessible.
Software Engineer, Data Center Infrastructure Management Lifecycle at Google
Compensation
$141,000 - $202,000
About the role
About the job
The DCIM Lifecycle team operates one of the largest-scale monitoring systems at Google, reading telemetry from millions of devices in every Google datacenter. Our issues include managing the rapid growth and diversification of the Google fleet and hardware, new use cases for critical monitoring of third-party facilities, and retiring technical debt.
Google is bringing back tape libraries to our data centers in order to support various critical requirements including new cold storage tier, better TCO, contingency for HDD/SSD shortage due to unprecedented AI/ML capacity demand. This role is to design and delivery Tape Health at Google scale for reliability.
In this role, you will work with your teammates to design, code, and put into production very large-scale distributed monitoring systems and work with your team and partner teams to enable new use cases for large-scale telemetry gathering. You will also create various system monitoring dashboards, defining service level objectives (SLOs), documentation and playbooks. You will have the opportunity to take onsite trips to one or more of Google's datacenters each year to work with new systems and data center technical staff in person.The US base salary range for this full-time position is $141,000-$202,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.
Responsibilities
-
Design, develop, and maintain software services for collecting and analyzing telemetry data from tape libraries, drives, and robotic components.
-
Implement algorithms and rules to detect, diagnose, and predict hardware failures.
-
Integrate tape health systems with Google's data center health monitoring infrastructure (e.g., system health, network doctor) and automated repair workflows (e.g., surgeon, silk roads).
-
Collaborate with hardware engineers and vendors to understand failure modes and improve diagnostic capabilities.
-
Develop dashboards and tools to provide visibility into the health and status of the tape hardware fleet. Participate in the full software development lifecycle, including requirements gathering, design, coding, testing, deployment, and operation.
Minimum qualifications
-
Bachelor’s degree or equivalent practical experience.
-
2 years of experience with coding in C++.
-
1 year of experience with distributed computing.
-
1 year of experience with debugging, troubleshooting and monitoring systems.
Preferred qualifications
-
Master's degree or PhD in Computer Science, or a related technical field.
-
2 years of experience in unit testing, integration testing, and continuous deployment.
-
2 years of experience in SQL.
Benefits and perks
•Equity
•Flexible Hours
•Parental Leave
•Healthcare
•Learning Budget
Required skills
Node.js
Python
JavaScript
Total Views
0
Total Apply Clicks
0
Total Mock Apply
0
Total Bookmarks
0
More open roles at Google

Senior Account Manager, Technology/Consumer Electronics, LCS

Senior Software Engineer, AI/ML, Google Cloud AI

Content Researcher, Events and Experiences

Senior Business Intelligence Developer, Google Cloud (English)

Technical Program Manager III, Finance Engineering, Corp Eng
Similar jobs

Associate Director, DT Portfolio Architect - Production (Remote)
Collins Aerospace (RTX) · US-CT-REMOTE

Enterprise Classified Cloud Sr. Manager
Collins Aerospace (RTX) · US-TX-RICHARDSON-C17 ~ 1717 Cityline Dr ~ CITYLINE C17

Senior Principal Engineer, Infrastructure Platform Architect (Onsite)
Collins Aerospace (RTX) · US-TX-PLANO-465 ~ 465 Independence Pkwy ~ INDEPENDENCE

CDS Platform Services
RTX (Raytheon) · US-CO-AURORA-S78 ~ 16201 E Centretech Pkwy ~ BLDG S78

Facilities Engineer (Onsite)
RTX (Raytheon) · US-MD-ANNAPOLIS-906 ~ 2551 Riva Rd ~ BLDG 906
About Google

Google specializes in internet-related services and products, including search, advertising, and software.
10,001+
Employees
Mountain View
Headquarters
$1,700B
Valuation
Reviews
10 reviews
4.5
10 reviews
Work-life balance
3.2
Compensation
4.3
Culture
4.1
Career
4.2
Management
3.8
82%
Recommend to a friend
Pros
Great benefits and perks
Innovative and interesting work
Career development and learning opportunities
Cons
High pressure and expectations
Long hours and heavy workload
Fast-paced and overwhelming environment
Salary Ranges
57,503 data points
Mid/L4
Mid/L4 · Accessibility Analyst
1 reports
$214,500
total per year
Base
$165,000
Stock
-
Bonus
-
$214,500
$214,500
Interview experience
9 interviews
Difficulty
3.4
/ 5
Duration
14-28 weeks
Offer rate
44%
Experience
Positive 0%
Neutral 56%
Negative 44%
Interview process
1
Application Review
2
Online Assessment/Technical Screen
3
Phone Screen
4
Onsite/Virtual Interviews
5
Team Matching
6
Offer
Common questions
Coding/Algorithm
System Design
Behavioral/STAR
Technical Knowledge
Product Sense
Latest updates
Our eighth generation TPUs: two chips for the agentic era - blog.google
blog.google
News
·
2w ago
Google Maps on Android Auto now shows bigger labels on streets along your route [Gallery] - 9to5Google
9to5Google
News
·
2w ago
Google to invest up to $40 billion in AI rival Anthropic - Reuters
Reuters
News
·
2w ago
Google to invest up to $40B in Anthropic in cash and compute - TechCrunch
TechCrunch
News
·
2w ago