Blog

Incident management insights, guides, and product updates from Rootly

Search...
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
How to Choose the Best On-Call Management Software for Your Team

How to Choose the Best On-Call Management Software for Your Team

Discover the top features to look for in on-call management software and learn how to choose the best one for your team.

JJ Tang

JJ Tang

July 22, 2024
10 mins
Top 3 on-call scheduling strategies every SRE should know

Top 3 on-call scheduling strategies every SRE should know

Discover the best on-call scheduling strategies for SREs in 2024

Iryna Iurchenko

Iryna Iurchenko

July 16, 2024
7 mins
Round Robin escalation policies: do's and don'ts

Round Robin escalation policies: do's and don'ts

Minimize alert fatigue by distributing incoming alerts evenly across responders with a Round Robin schedule. This strategy comes in two variations and can benefit some teams more than others.

Ashley Sawatsky

Ashley Sawatsky

July 9, 2024
7 mins
Measuring developer productivity IRL: practical tips for platform engineers

Measuring developer productivity IRL: practical tips for platform engineers

What should you measure and how ? Industry experts weight in sharing insights from their experience leading engineering organizations at scale.

Jorge Lainfiesta

Jorge Lainfiesta

July 5, 2024
5 mins
How Meta and Google use AI to improve incident response

How Meta and Google use AI to improve incident response

Discover how Google is optimizing for accuracy in its AI strategy, while Meta strives to expand its response capabilities through machine learning.

JJ Tang

JJ Tang

July 2, 2024
6 mins
The Top Resources for Site Reliability Engineers in 2024

The Top Resources for Site Reliability Engineers in 2024

We recently spoke to Google's Reliability Advocate, Steve McGhee, in our Humans of Reliability interview series. In addition to his interesting anecdotes on the early days of SRE at Google, and his journey to becoming a Reliability Advocate, he also shared a handful of his favorite SRE resources, which we compiled here into a list.

Jorge Lainfiesta

Jorge Lainfiesta

June 21, 2024
5 min
How Wealthsimple uses Rootly to create a culture of wellness and psychological safety

How Wealthsimple uses Rootly to create a culture of wellness and psychological safety

"Our goal is to make it easy for employees to come in and run an incident without needing deep technical knowledge about the system. Rootly has made this easier by allowing us to automate a lot of the “hand-holding" someone needs when they’re first navigating an incident."

Rootly & Wealthsimple

Rootly & Wealthsimple

June 11, 2024
5 min
What is ‘Incident Overhead’ and why does It matter?

What is ‘Incident Overhead’ and why does It matter?

Not all incidents are created equal. Thus, trying to fit all the possible inputs an incident declaration may need in a single form can slow down responders and impact your data quality.

Jorge Lainfiesta

Jorge Lainfiesta

June 5, 2024
4 mins
What we can learn from Google’s UniSuper incident comms

What we can learn from Google’s UniSuper incident comms

Earlier this month, an inadvertent misconfiguration in an internal tool used by Google Cloud resulted in the deletion of a user’s GCVE Private Cloud. The user in question? UniSuper Australia — a $125 billion Australian pension fund with over 600,000 users. In this post, Ashley reflects on the communications shared and what we can learn from them.

Ashley Sawatsky

Ashley Sawatsky

May 30, 2024
11 mins