When More Incident Commanders are Better
When incidents reach a heightened level of complexity and scale, Strong argues that companies ought to consider having multiple lead roles present, rather than a single Commander overseeing the entire response. In this post, he breaks down when and how he recommends you consider bringing additional command roles in.
Status Pages 101: How to Create a Status Page You and Your Customers Will Actually Want to Use
Status pages are a simple yet underutilized element of incident communication. Done well, they’re a low-lift way to keep your customers and stakeholders informed when incidents impact them. But without a solid approach, updating status pages can easily become a tedious and often neglected task during incidents. In this post, we’ll cover some tips to get your status page right.
Working Effectively With Executives During an Incident
For many people, the first and only times they interact with Executives is during an incident. It can be an intimidating first introduction! While Execs are first and foremost just people too, they tend to require some specific care when it comes to communication, especially when it involves issues that critically impact your business and customers. In this post, we’ll cover the best practices for communicating effectively with Executives during incidents.
Top 5 Resiliency Trends of 2023
In this guest post, Rohit Ghumare explores the most crucial trends for resiliency in 2023 – from automated incident management and real-time analysis to cloud-native services and human factors driving secure, collaborative workflows. By incorporating these cutting-edge approaches into your software development processes, you'll position your organization for long-term success.
Celebrating Our Nine New G2 Awards
We’re proud to share that we've been recognized as a High Performer and Enterprise Leader in Incident Management for the sixth consecutive quarter in G2 Summer 2023 Report! In total, Rootly received nine G2 awards in the Summer Report.
We Need to Talk About the Hero Pattern Among SREs
Hans Chung refers to the tendency for SREs to independently zoom in on one task or problem at a time, and the consequences that come with it, as the “solo hero pattern”. In this post, he explores some of the reasons it happens, and what SRE leaders can do about it.
But It’s Not Our Fault! When Third-party Incidents Affect Your Service
Between cloud service providers, payment processors, content delivery networks, and more, chances are you rely on external systems to keep your product working. So what do you do when someone else's incident becomes your problem? It’s probably not realistic to completely eliminate third-party dependencies, but there are things you can do to enhance your resilience against third-party failures and maintain trust with your customers when outages out of your control impact them.
Rootly Raises $12 Million from Renegade Partners, Google Gradient Ventures, & XYZ Ventures
Rootly has already helped companies manage 60,000+ incidents and we are just getting started! We are on a mission to make reliability every company’s superpower.
Kubernetes Incident Management Best Practices
In this post, Rajesh Tilwani (Co-Founder of Humalect) covers a variety of strategies for preventing and managing incidents with Kubernetes.
Improve Visibility and Capture More Data with Triage Incidents
As new incidents emerge, there are often many unknowns about the size, severity, and cause of the problem. Sometimes it’s not clear if the problem is an incident at all. That’s where introducing a triage stage to your incident management process can help. In this post, we’ll look at the benefits of adding a triage layer to your incident management, and how Rootly’s Triage feature allows you to seamlessly transition from triage to real incident (or false alarm).