Get Rootly's Incident Communications Playbook

Don't let an incident catch you off guard - download our new Incident Comms Playbook for effective incident comms strategies!

By submitting this form, you agree to the Privacy Policy and Terms of Use and agree to sharing your information with Rootly and Google.

The Rootly Philosophy

What we've learned from powering incident response at 100's of leading companies including:

Lattice
Linkedin
Cisco
Elastic
replit
Grammarly
Nvidia
Shell
Canva
Tripadvisor
Lattice
Linkedin
Cisco
Elastic
replit
Grammarly
Nvidia
Shell
Canva
Tripadvisor

How we build

We’re on a mission to help every organization become more reliable, continuously improve, and inspire confidence in those who rely on them. It’s not just what we build in service of this mission that matters, it’s how we build it. These principles guide our teams to focus on the right things and drive the highest impact.

Simple but powerful

Simple but powerful

Great tools are simple but powerful. We design every experience within the platform to be intuitive and ready to use out of the box with little to no configuration. A user interface should show you the info you need for the task you’re accomplishing, without distracting extraneous information.

Simple but powerful
Purpose built

Purpose built

Many tools used in the incident response space are general purpose automation tools, or add-ons from tools that service another part of the stack. We are different. We are a purpose-built incident management platform. We don’t bloat the platform with features that don’t directly drive our mission.

Purpose built
Opinionated defaults

Opinionated defaults

You don’t need to be an incident response expert to use Rootly effectively. We don’t A/B test features and implement the most popular version—we’ve helped users manage thousands of incidents, and we know what works. We infuse this knowledge throughout the platform as default settings (we refer to these as “smart defaults”) so users are armed with best practices from the moment they sign up.

Opinionated defaults
Great tools get out of the way

Great tools get out of the way

Tooling should eliminate toil and friction, not contribute to it. We don’t want users to think of Rootly as something they go out of their way to use. Interaction with the tool should be seamless with the work they’re already doing.

Great tools get out of the way
Keep garden walls low

Keep garden walls low

To keep implementation simple, it’s crucial to think beyond our own tool. We put a big focus on compatibility and integration with other parts of the DevOps stack, like observability and task management tools, which allows us to remain laser focused on incident management.

Keep garden walls low
Reliability is a feature

Reliability is a feature

Velocity cannot come at the expense of reliability and performance. In addition to our new feature roadmap, we ship quality of life improvements in every development cycle. Because of the business critical nature of our product, especially when it comes to alerting, we invest heavily in a multi-cloud redundant infrastructure.

Reliability is a feature
Context matters

Context matters

We’re acutely aware that our customers use Rootly on their worst days. When we test features, we ask ourselves “What if I was in the midst of a SEV0? Would this be fast enough?”, “Would this add friction or reduce it?”, etc. In incidents, every second count and every move carries extra impact.

Context matters
Talk to customers

Talk to customers

A deep understanding of the customer experience is essential to all functions, and just looking at data points on a dashboard isn’t going to get us there. Everyone—from engineers to marketers to executives—regularly speaks with our customers directly, and not just when things go wrong. We’re proactively engaging with our users to gain a real understanding of their journey and challenges. We make this easy by creating a shared Slack channel for every organization that uses Rootly, and even giving them the ability to page our team for urgent issues.

Talk to customers
Be your own power users

Be your own power users

No matter your role at the company, one of the first things people learn is how to demo the product. Not only does this ensure everyone is comfortable navigating the product in customer conversations, it also gives us consistent insight into the experience for a brand-new user

Be your own power users
Managers are operators

Managers are operators

We do not hire managers that aren’t experts in their craft. Without domain expertise, managers become unreliable conduits for direction from senior leadership. In order to properly advocate for their teams and convey strategic direction effectively, we expect managers to have a deep understanding and involvement in the day to day work of their ICs.

Managers are operators
Know when a customer isn’t the right fit

Know when a customer isn’t the right fit

While our product works at all scales, from startup to enterprise and in between, it’s not the right solution for everyone, and that’s okay. Every new customer relationship starts with a discovery process to make sure that new users understand the intent of the product and the work it supports. We don’t waste time trying to force-fit the platform as something it isn’t.

Know when a customer isn’t the right fit
Simple but powerful

Great tools are simple but powerful. We design every experience within the platform to be intuitive and ready to use out of the box with little to no configuration. A user interface should show you the info you need for the task you’re accomplishing, without distracting extraneous information.

Simple but powerful
Purpose built

Many tools used in the incident response space are general purpose automation tools, or add-ons from tools that service another part of the stack. We are different. We are a purpose-built incident management platform. We don’t bloat the platform with features that don’t directly drive our mission.

Purpose built
Opinionated defaults

You don’t need to be an incident response expert to use Rootly effectively. We don’t A/B test features and implement the most popular version—we’ve helped users manage thousands of incidents, and we know what works. We infuse this knowledge throughout the platform as default settings (we refer to these as “smart defaults”) so users are armed with best practices from the moment they sign up.

Opinionated defaults
Great tools get out of the way

Tooling should eliminate toil and friction, not contribute to it. We don’t want users to think of Rootly as something they go out of their way to use. Interaction with the tool should be seamless with the work they’re already doing.

Great tools get out of the way
Keep garden walls low

To keep implementation simple, it’s crucial to think beyond our own tool. We put a big focus on compatibility and integration with other parts of the DevOps stack, like observability and task management tools, which allows us to remain laser focused on incident management.

Keep garden walls low
Reliability is a feature

Velocity cannot come at the expense of reliability and performance. In addition to our new feature roadmap, we ship quality of life improvements in every development cycle. Because of the business critical nature of our product, especially when it comes to alerting, we invest heavily in a multi-cloud redundant infrastructure.

Reliability is a feature
Context matters

We’re acutely aware that our customers use Rootly on their worst days. When we test features, we ask ourselves “What if I was in the midst of a SEV0? Would this be fast enough?”, “Would this add friction or reduce it?”, etc. In incidents, every second count and every move carries extra impact.

Context matters
Talk to customers

A deep understanding of the customer experience is essential to all functions, and just looking at data points on a dashboard isn’t going to get us there. Everyone—from engineers to marketers to executives—regularly speaks with our customers directly, and not just when things go wrong. We’re proactively engaging with our users to gain a real understanding of their journey and challenges. We make this easy by creating a shared Slack channel for every organization that uses Rootly, and even giving them the ability to page our team for urgent issues.

Talk to customers
Be your own power users

No matter your role at the company, one of the first things people learn is how to demo the product. Not only does this ensure everyone is comfortable navigating the product in customer conversations, it also gives us consistent insight into the experience for a brand-new user

Be your own power users
Managers are operators

We do not hire managers that aren’t experts in their craft. Without domain expertise, managers become unreliable conduits for direction from senior leadership. In order to properly advocate for their teams and convey strategic direction effectively, we expect managers to have a deep understanding and involvement in the day to day work of their ICs.

Managers are operators
Know when a customer isn’t the right fit

While our product works at all scales, from startup to enterprise and in between, it’s not the right solution for everyone, and that’s okay. Every new customer relationship starts with a discovery process to make sure that new users understand the intent of the product and the work it supports. We don’t waste time trying to force-fit the platform as something it isn’t.

Know when a customer isn’t the right fit

Get incident response tips, guides, and content delivered straight to your inbox

What we believe

Rootly’s Modern Incident Response Philosophy

Incidents build great companies. Imagine a person who had never had anything go wrong in their entire life. They had never failed, never made a mistake, never felt embarrassed. Would that be the type of person you look up to and aspire to be? Probably not.

The people we admire are those who have experienced real challenges and overcome them to become a better, more resilient version of themselves. We believe that companies are no different. Incidents are an unavoidable—even necessary—part of running a business.

But it isn’t enough to simply have incidents. How we handle them, talk about them, and learn from them matters, and it’s changing as fast as the technology industry itself. As active members in the incident response and reliability community, and stewards of a platform that has seen over 150,000 incidents, we have a front-row seat to understand and influence change as it happens in our industry.

These are the ideas that we believe best represent the modern era of incident response. Together, they make up our modern incident response philosophy—the “new world” of incident response we, together with our customers and community, are driving forward.

CEO signature

Modern Incident
Response Philosophy

Preparedness
Incident response isn’t solely reactive.

Incident response isn’t something to be ignored or forgotten during “peacetime”, it’s a muscle to be consistently trained so that when an incident hits, it’s in peak condition—not atrophied. This means running game days, risk identification exercises, and chaos engineering.

Doing so not only enhances your readiness for incidents, but improves psychological safety by providing a safe space for responders to experiment and practice their skills outside of real incidents.

Preparedness
There’s no such thing as “done”.

An incident response program is a living thing that evolves and changes based on your environment and business needs. It’s not something you file away in a playbook to dust off when s#!t hits the fan.

Just because something worked once, don’t fall into the trap of thinking it will work every time.

Preparedness
Language matters.

Whether we realize it or not, the lexicon we use influences the way we think and feel. Incident response comes with its own set of jargon, and it often skews towards unsavory themes like violence and death. Think about it—postmortem, war room, etc.

It’s likely that this is due to the roots of incident response with regards to military, health care, and other crisis response work. We think those terms should stay where they belong. When we’re talking about things like software, there’s no need to conflate incident management with real life or death situations.

Preparedness
New responders need more than a training manual.

Incident response is a craft in itself and takes dedicated focus to perform well.

Keep this in mind when you onboard new responders and set them up for success by allowing them to shadow experienced responders and practice in safe-to-fail incident simulations until they have the confidence to jump into the real thing.

Response
One size doesn’t fit all.

Every organization has unique structures, systems, values, audiences, and more that should help shape their incident response program.

While established frameworks serve as a solid foundation, incident response programs should be tailored over time to suit each organization’s unique context.

Response
Declare fast, declare often.

Debates around whether or not something is truly an incident, or whether it’s a SEV2 or SEV3, are an anti-pattern. If a problem warrants an immediate response, assume it is an incident and triage as such. Stuck between two severities?

Go with the more severe and downgrade if needed. You can always categorize a false alarm retroactively, but you can’t get back the time you spent categorizing an incident instead of resolving it.

Response
Incident response isn’t just for engineers.

Gone are the days where incident response is solely for technical problems solved by engineers.

Customer Support, Communications, Legal teams, and more bring crucial skills and points of view to the incident response team, and should be empowered as equal members of the response.

Response
Commanders must be empowered to command.

Anyone can be an incident commander, regardless of their official job title, but whoever is in the role must be empowered and supported to be the final decision-maker in the incident.

This means that if anyone — including an executive — wants to override the commander’s decision or leadership, they should take command of the incident and step in as Incident Commander.

Post-incident learning
Incident data should be open (within an organization) by default.

There’s a time and place for private incidents, but we believe that incident details should be accessible internally and that the lessons learned in incidents are applicable beyond those directly involved.

The choice to make an incident private should be done deliberately with reason, and not as the default option.

Post-incident learning
Root cause is a fallacy.

In modern day complex systems, chasing a single root cause for an incident is, more often than not, the wrong approach.

We advocate for a more holistic retrospective process that considers the full breadth of contributing factors.

Post-incident learning
Blameless doesn’t mean lack of accountability.

The “blameless” retrospective process marked a huge shift in incident response culture, but it’s important to understand what it means and what it doesn’t. Blameless doesn’t mean human errors cannot be a contributing factor, or that people aren’t responsible for learning from and fixing their mistakes.

It means all that is possible—and in fact, easier—when people can openly share their mistakes without fear of being reprimanded or losing the respect of their team.

Post-incident learning
It’s okay to skip the retro (sometimes).

Post-incident learning is not a “one size fits all” process. While retrospectives have gotten much more efficient thanks to automation and AI tools that take care of the tedious admin work, they do still require time and energy.

For low severity incidents, or cases where resolution was reached quickly and easily, there are times when you might feel like the juice isn’t worth the squeeze. As long as retros are being skipped for the right reasons,teams should feel empowered to right-size the retro process to the incident.

View our 21 principles
View our 21 principles