Back to results

High Performance Compute (HPC) Site Reliability Engineer (Senior Level Role) – Leading AI Company

Remote
£120,000 - £140,000

The client:Our client is a start-up who offers an innovative form of GPU computing infrastructure and Cloud-Native Orchestration solutions to Technology and AI firms worldwide. 🧑🏿‍💻👨🏼‍💻👩🏾‍💻

What they need: They are looking for a senior HPC SRE to play a key role guaranteeing the reliability of their HPC environments. You will have an opportunity to design and build HPC infrastructure, as well as setting up clusters from scratch. Key skills include HPC architecture, networking technologies (ie TCP, UDP, IPv4 or MP-BGP), and network protocols (ie RoCE or RDMA).

In this position you will:

Get to set up HPC clusters from scratch.
You will take ownership of investigations into high-priority incidents, identify solutions, and prepare Root Cause Analysis (RCA).
Work in the AI space, where you will get exposure to the latest Machine Learning technology.

Please apply to find out more.

Contact:
Harry Kemp

apply now

High Performance Compute (HPC) Site Reliability Engineer (Senior Level Role) – Leading AI Company

Remote £120,000 - £140,000

Related jobs

Remote
£120,000 - £140,000