Principal Site Reliability Engineer - Federal Team

Saviynt LABS
Full-time
Atlanta
Posted on 25 days ago

Job Description

Saviynt is seeking a Principal Site Reliability Engineer to design, build, and run their Enterprise Identity solutions. This role focuses on ensuring high availability and performance of the platform, collaborating with engineering and operations teams, and implementing strategies to enhance system uptime and reliability. The ideal candidate will have extensive experience in monitoring, alerting, and cloud development.

Responsibilities

  • Implement monitoring and alerting systems
  • Collaborate with engineering and operations teams
  • Design and implement strategies for system uptime
  • Evaluate and recommend infrastructure improvements
  • Align platform with customer needs
  • Run the production environment
  • Build software for platform monitoring
  • Improve reliability, quality, and time-to-market
  • Measure and optimize system performance
  • Provide operational support for distributed applications
  • Gather and analyze metrics for performance tuning

Requirements

  • U.S. Citizenship
  • Master’s Degree in Engineering or equivalent experience
  • 10+ years of experience in Monitoring and Alerting roles on major cloud platforms
  • 4+ years of experience in Cloud development and observability
  • Experience with AWS cloud environments
  • 3+ years of experience in software development with Python, NodeJS, or Java
  • Hands-on experience with Kubernetes
  • Experience with Prometheus, Grafana, Datadog, AWS Cloudwatch, Azure Monitor, Log Analytics
  • Proven experience in implementing observability practices

Benefits

  • No benefits