Senior Software Engineer, Site Reliability Engineering
About the role
The challenge
The Site Reliability team is responsible for a broad set of technologies and systems with expectations to collaborate across the business. We are expected to develop and enhance existing capabilities while ensuring scalability, reliability and resiliency of infrastructure and software. You’ll work with engineering teams ranging from product development, developer experience, and backend infrastructure to collaboratively build Thumbtack’s ecosystem of platform services that have the right impact at the right time. Thumbtack values its cross functional collaborative culture, and you’d be positioned to contribute to the future direction and success of the engineering platform that serves as the engine of our applications.
What you’ll do
- Design, create, and maintain software and systems to improve the availability, scalability, and efficiency of Thumbtack's services
- Set the architectural direction of infrastructure and platform services while supporting the engineering organization
- Design and implement tools and processes used for deployment, change, service, and infrastructure management
- Troubleshoot and debug critical systems throughout the SDLC
- Contribute to the evolution and performance of capabilities we provide to engineering as a platform organization
- Capacity planning and demand forecasting, anticipating performance bottlenecks
- Participate in rotating on-call duties
In order to be successful, you must bring
- Extensive fluency in AWS and Linux
- Ability to effectively read, write, and debug code in programming languages like but not limited to: Python, Go, PHP, Javascript
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems across web technologies like: DNS, TLS, HTTP/S, TCP/IP
- Ability to decompose complex problems while understanding the tradeoffs necessary to deliver impact
- 5 years of experience managing infrastructure and systems
- Demonstrable knowledge of instrumenting, operating, and observing a distributed system of microservices in a production cloud environment
- Ability to communicate clearly and effectively to cross functional partners of various technical levels
- Passion for reducing toil and improving developer experience
Expected salary ranges
- For candidates living in Ontario and British Columbia, the expected salary range for the role is currently $180,200.00 - $233,200.00
About the Site Reliability Engineering Team
Thumbtack's Site Reliability Engineering team focuses on creating and maintaining a reliable, secure, and scalable platform vital for a seamless user experience. As a key contributor, you will design and support resilient systems, prioritizing high performance, availability, and throughput, with a focus on minimizing service disruptions, downtime, and latency. SRE impacts Thumbtack’s ecosystem across the entire stack, from linux systems to applications that drive the customer experience. Our work is high leverage impacting how Engineering, Applied Science, and many other teams deliver, run, and observe systems.
Required skills
AWS
Linux
Python
Go
Distributed systems
SRE
Troubleshooting
About Thumbtack
Virtual
Headquarters