AI

Being an On-Call Engineer: A Google SRE Perspective

Abstract

Being on-call is a critical duty that many operations and engineering teams must undertake in order to keep their services reliable and available. However, there are several pitfalls in the organization of on-call rotations and responsibilities that can lead to serious consequences for the services and for the teams if not avoided. We provide the primary tenets of the approach to on-call that Google’s Site Reliability Engineers have developed over years, and explain how that approach has led to reliable services and sustainable workload over time.