ForHire

Member of Technical Staff, Inference

xAI

Full-time

Palo Alto, CA; San Francisco, CA

$180,000 - $440,000

Posted on a month ago

Job Description

xAI is seeking a highly motivated engineer to optimize model inference, build production serving systems, and accelerate research on scaling test-time compute. The role involves hands-on contributions to open-source projects and requires strong communication and prioritization skills.

Responsibilities

Optimizing latency and throughput of model inference
Building reliable and performant production serving systems
Accelerating research on scaling test-time compute
Model-hardware co-design for next-generation architectures
System optimizations for model serving (batching, caching, load balancing)
Low-level optimizations for inference (GPU kernels, code generation)
Algorithmic optimizations for inference (quantization, distillation)
Working on large-scale inference engines

Requirements

Experience with Python/Rust, PyTorch/JAX, CUDA/CUTLASS/Triton/NCCL, Kubernetes
Experience with system optimizations for model serving
Experience with low-level optimizations for inference
Experience with algorithmic optimizations for inference
Experience with large-scale inference engines or reinforcement learning frameworks
Experience with large-scale, high-concurrent production serving
Experience with testing, benchmarking, and reliability of inference services
Strong communication skills

Benefits

No benefits