Software Engineer - Infrastructure

xAI
Full-time
Palo Alto, CA
$180,000 - $440,000
Posted on a month ago

Job Description

xAI is seeking exceptional infrastructure engineers to build and maintain cutting-edge distributed systems for large-scale AI infrastructure. This role involves scaling compute and data platforms, ensuring reliability, and optimizing performance. The initial phase with 200,000 GPUs is just the beginning of their infrastructure roadmap.

Responsibilities

  • Scale compute infrastructure on Kubernetes
  • Design and maintain traffic shaping and load balancing deployments using Envoy
  • Scale data platforms and observability systems
  • Drive reliability, standardization, and performance
  • Manage and optimize large-scale storage systems
  • Contribute to quality-of-life improvements for developers

Requirements

  • 2+ years of industry experience with large-scale distributed systems
  • Proficient in Golang, Rust, Python, or similar languages
  • Familiarity with modern developer tools (Bazel, Buildkite, Argo, Kubernetes) is a plus
  • Experience with large-scale storage systems is a plus
  • Passion for reliability, performance optimization, and scalability

Benefits

  • No benefits