refresh

Trending companies

Trending companies

Amazon
Amazon

Server Lab Engineer , ML-IL at Amazon

RoleEngineering
LevelMid Level
LocationTel Aviv, ISR
WorkOn-site
TypeFull-time
Posted1 week ago
Apply now

About the role

Machine Learning Israel (MLIL), as part of Annapurna Labs / Amazon, is hiring a Lab Engineer to own and operate the labs that powers the bring-up and validation of our next-generation ML training and inference racks. In this role you will build, maintain, and continuously evolve the lab infrastructure — from bench setups to server racks — used daily by HW, FW, and SW engineers. You will be the go-to person for delivering working, instrumented setups that the R&D teams can pick up and run with.

  • Key job responsibilities
  • Own the MLIL hardware lab in the Tel-Aviv office: physical layout, power and cooling budget, network topology, cabling, asset tracking, and day-to-day operations.
  • Build, configure, and connect new lab setups for HW, FW, and SW engineers — including Servers, GPU sleds, PCIe switches, retimers, NICs, and DRAM modules — and deliver them ready for R&D use.
  • Administer and maintain Linux-based servers and systems, including installation, configuration, and optimization
  • Manage and configure network services such as DHCP, PXE, and other critical infrastructure components.
  • Run sanity tests on every delivered setup — boot, PCIe enumeration, basic DRAM check, network reachability — so R&D teams pick up a known-good baseline and can focus on their work.
  • Write and maintain automation scripts (Python / Bash) for repetitive lab tasks — power cycling, log collection, provisioning, imaging, test-harness setup.
  • Procure, inventory, and manage lab equipment: bench PSUs, scopes, protocol analyzers, thermal chambers, JTAG debuggers, cables, and fixtures.
  • Triage lab-level issues (power, network, cabling, imaging) to unblock R&D fast; escalate deep HW / FW / SW debug (e.g., RDMA / GPU / EFA internals) to the relevant specialist teams.

Basic Qualifications

  • 3+ years experience as a System-Admin/Lab Engineer or in a similar role
  • Knowledge of Linux operating systems and server administration
  • Solid understanding of networking fundamentals — Ethernet, TCP/IP, link-layer debug, switch / NIC configuration.

Preferred Qualifications

  • Proven hands-on experience with lab instrumentation: scopes, logic analyzers, protocol analyzers, bench PSUs, JTAG / BMC debug.
  • B.Sc in Electrical / Electronics / Computer Engineering, or a Practical Engineer diploma (הנדסאי) with hands-on experience.
  • Solid understanding of PCIe — enumeration, link training, lane configuration, error reporting (AER), and common debug flows.
  • Experience with BMC / BIOS / UEFI debug, IPMI, Redfish.
  • Experience with high-speed serial debug — Ser Des, equalization, eye diagrams, BER testing.
  • Proficient in Python / Bash automation and willing to write production-grade lab tooling.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Required skills

lab operations

hardware validation

test setups

infrastructure maintenance

debugging

system bring-up

About Amazon

Tel Aviv

Headquarters