
SRE WEEKLY – scalability, availability, incident response, automation
About SRE WEEKLY – scalability, availability, incident response, automation
About SRE Weekly
SRE Weekly is a newsletter devoted to everything related to keeping a site or service available as consistently as possible. SRE (Site/Service Reliability Engineering) isn’t just about automated failover or fault-tolerant architectures — although of course those are important. It’s about a holistic view of reliability that takes into account everything from servers to human factors to processes to automation and more.
Did “human error” cause that outage? What caused the human to make the error? Can we make it impossible for them to make that kind of error through automation?