
AI processors and RISC-V computing
Sr. Software Engineer, Observability and Telemetry
Required skills
SQL
Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.
Tenstorrent is building the world’s fastest, most efficient AI compute clusters. Our modular RISC-V and AI processors can snap together into a single, massively parallel distributed supercomputer consisting of thousands of compute nodes. As we scale, the volume and complexity of operational data grows by orders of magnitude. Observability and telemetry are key to ensuring our customers can resolve problems in minutes rather than hours. The telemetry team owns our proprietary telemetry infrastructure, spanning from the device level to the infrastructure needed to drive dashboards, monitoring systems, and orchestration.
This role ishybrid, based out of Santa Clara, CA; Austin, TX; or Toronto, ON.
We welcome candidates at various experience levels for this role. During the interview process, candidates will be assessed for the appropriate level, and offers will align with that level, which may differ from the one in this posting.
Who You Are
-
Strong C++ engineer and comfortable working in both low-level environments and distributed systems design.
-
Experience building atop observability platforms such as Prometheus, Open Telemetry, Grafana, Click House, or similar technologies.
-
Solid understanding of data structures for manipulating large volumes of data.
-
Familiarity with SQL databases, with time-series databases a plus.
-
Curious about networking and communication across large clusters and comfortable reasoning from first principles while challenging industry conventions.
What We Need
-
Architect, implement, and maintain TT-Telemetry, our C++-based service for collecting and exporting device-level metrics.
-
Interface with internal engineering teams to build a deep understanding of Tenstorrent’s architecture and identify and surface useful metrics.
-
Design efficient built-in web GUIs for observing device- and cluster-level state, diagnosing problems, and monitoring utilization.
-
Design ingestion pipelines for industry standard telemetry systems (e.g., Prometheus).
-
Help define the long-term architecture of Tenstorrent’s distributed telemetry stack.
What You Will Learn
-
How large-scale AI clusters are architected from the networking layer up.
-
The performance characteristics of custom AI hardware and RISC-V processors at scale.
-
How telemetry and observability considerations impact the design of next-gen AI accelerators.
-
How to design and architect a world-class telemetry and observability platform from the ground up.
Compensation for all engineers at Tenstorrent ranges from $100k - $500k including base and variable compensation targets. Experience, skills, education, background and location all impact the actual offer made.
Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.
This offer of employment is contingent upon the applicant being eligible to access U.S. export-controlled technology. Due to U.S. export laws, including those codified in the U.S. Export Administration Regulations (EAR), the Company is required to ensure compliance with these laws when transferring technology to nationals of certain countries (such as EAR Country Groups D:1, E1, and E2). These requirements apply to persons located in the U.S. and all countries outside the U.S. As the position offered will have direct and/or indirect access to information, systems, or technologies subject to these laws, the offer may be contingent upon your citizenship/permanent residency status or ability to obtain prior license approval from the U.S. Commerce Department or applicable federal agency. If employment is not possible due to U.S. export laws, any offer of employment will be rescinded.
Total Views
0
Total Apply Clicks
0
Total Mock Apply
0
Total Bookmarks
0
Similar jobs

Senior Software Engineer, Datapath, Virtualization and Kubevirt
Pure Storage · Santa Clara, California

Staff AI Software Engineer
Qualcomm · Santa Clara, California, United States of America

Senior Software Engineer, Computer Vision - Autonomous Vehicles
NVIDIA · US, CA, Santa Clara

Senior Systems Software Engineer - GPU Diagnostics
NVIDIA · US, CA, Santa Clara
About Tenstorrent

Tenstorrent
Series CTenstorrent is a semiconductor company that develops AI accelerator chips and software for machine learning workloads. The company focuses on creating scalable processor architectures for data centers and edge computing applications.
201-500
Employees
Toronto
Headquarters
$2.6B
Valuation
Reviews
10 reviews
3.8
10 reviews
Work-life balance
3.2
Compensation
2.8
Culture
4.1
Career
3.4
Management
4.2
72%
Recommend to a friend
Pros
Supportive management and strong leadership
Great team culture and fantastic colleagues
Cutting-edge technology and challenging projects
Cons
Heavy workload and frequent overtime
Fast-paced and stressful environment
Below industry standard salary
Salary Ranges
24 data points
Staff/L6
Staff/L6 · Staff Field Application Engineer
1 reports
$261,520
total per year
Base
$201,323
Stock
-
Bonus
-
$261,520
$261,520
Latest updates
Tenstorrent Previews Large Compute Cluster, Generates Video Faster Than Real Time - EE Times
EE Times
News
·
2w ago
Former Tenstorrent Execs Launch AI& to Build Japan’s Full-stack AI Infrastructure - EE Times Asia
EE Times Asia
News
·
4w ago
Ex-Tenstorrent Execs Start Cloud Provider, AI Lab in Japan - EE Times
EE Times
News
·
6w ago
Interview with Toloka CEO (in Russian)
If you happen to understand Russian, here is a 2h interview with Toloka CEO Olga Megorskaya. Few things that I noted: * Industry is called Human Data, the biggest competitor is Scale AI. Basically they know how to produce human-generated data of a high quality that is used to train/post-train AI. These days it's a highly skilled people, sometimes with PhDs or many years of experience, but humans are unreliable, don't follow instructions, etc. - so it is a challenge to produce high quality data
·
6w ago
·
28
·
7