Member of Technical Staff, Inference

xAI
Full-time
Palo Alto, CA; San Francisco, CA
$180,000 - $440,000
Posted on a month ago

Job Description

xAI is seeking a highly motivated engineer to optimize model inference, build production serving systems, and accelerate research on scaling test-time compute. The role involves hands-on contributions to open-source projects and requires strong communication and prioritization skills.

Responsibilities

  • Optimizing latency and throughput of model inference
  • Building reliable and performant production serving systems
  • Accelerating research on scaling test-time compute
  • Model-hardware co-design for next-generation architectures
  • System optimizations for model serving (batching, caching, load balancing)
  • Low-level optimizations for inference (GPU kernels, code generation)
  • Algorithmic optimizations for inference (quantization, distillation)
  • Working on large-scale inference engines

Requirements

  • Experience with Python/Rust, PyTorch/JAX, CUDA/CUTLASS/Triton/NCCL, Kubernetes
  • Experience with system optimizations for model serving
  • Experience with low-level optimizations for inference
  • Experience with algorithmic optimizations for inference
  • Experience with large-scale inference engines or reinforcement learning frameworks
  • Experience with large-scale, high-concurrent production serving
  • Experience with testing, benchmarking, and reliability of inference services
  • Strong communication skills

Benefits

  • No benefits