The Rogers Outage of 2022: 3 Crucial Takeaways for SREs
Millions of Canadians offline. For SREs, the Rogers outage is a lesson in the importance of testing updates, building redundant infrastructure and having a crisis communications plan.
Monitoring Your Platform From Multiple Locations
SREs face multiple challenges while their platform becomes available in different locations on the globe. One step in overcoming them is building a solid monitoring system to enable that.
Why More Incidents Are Better
Totally preventing all incidents is not only unrealistic. It’s actually undesirable in some respects.
5 Tips If You’re the 1st SRE Hire by Instacart's First SRE
Best practices for “SRE pioneers” – meaning engineers who are the very first SREs hired at an organization.
What SREs Can Learn from the Atlassian Nightmare Outage of 2022
A look at the Atlassian outage of April 2022, and what it stands to teach Site Reliability Engineers. A lot to unpack here.
Podcast: Break Things on Purpose with Gremlin | Building Rootly with JJ Tang
Our co-founder JJ reflects on building the fastest-growing incident management platform and the surprising learnings.
The Pros and Cons of Embedded SREs
A comparison of the two main SRE team models: Embedded SREs vs. standalone SRE teams.
SRE vs. Platform Engineering: The Key Differences, Explained
An overview of the similarities and differences between Site Reliability Engineering and Platform Engineering, including from a career perspective.
What Does AIOps Mean for SREs? It’s Complicated.
AIOps can bring some value to SREs, but it’s important to maintain healthy perspective about the limitations of AIOps.