Site Reliability Engineer

LightEdge Solutions
Full-time
Ashburn, VA
Posted on a month ago

Job Description

The Site Reliability Engineer will be responsible for the reliable operation of the organization’s systems and services, focusing on monitoring strategy, accuracy, and improvement across multiple products. This role involves designing and implementing monitoring solutions, establishing metrics, utilizing AIOPS for automation, and integrating monitoring with IT service management platforms.

Responsibilities

  • Design and implement monitoring solutions
  • Establish metrics (SLAs, SLOs, SLIs)
  • Utilize AIOPS for incident management
  • Integrate monitoring with IT service management platforms
  • Perform systems design, implementation, and integration
  • Develop detailed designs and troubleshoot solutions

Requirements

  • 5+ years of experience with enterprise monitoring solutions
  • Knowledge of Network Switches, Server hardware, Storage, and Virtualization Technologies
  • Understanding of VMware Infrastructure
  • Experience with Zabbix, vRealize Operations Manager, Nagios, and Science Logic
  • Experience with ServiceNow or similar IT service management platforms
  • Experience with managing automations within a monitoring environment
  • Excellent communication skills

Benefits

  • No benefits