Get Rootly's Incident Communications Playbook

Don't let an incident catch you off guard - download our new Incident Comms Playbook for effective incident comms strategies!

By submitting this form, you agree to the Privacy Policy and Terms of Use and agree to sharing your information with Rootly and Google.

Blog

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Classifying Error Logs with AI: Can DeepSeek R1 Outperform GPT-4o and Llama 3?

Can a smaller AI model outperform a larger one? A distilled version of DeepSeek R1 (70B) outperformed Llama and nearly matched GPT-4o in classifying error logs. These results suggest that model efficiency, not just size, is key to AI performance in incident management.

Sylvain Kalache

February 19, 2025

6 mins

Incident Communications in 2025: Strategies from Industry Leaders

Are you buried under tickets and dubious SEV scales? Industry leaders are challenging the basics of how teams should communicate during incidents.

Ryan McDonald

February 17, 2025

9 mins

RescueOps - Ep. 8: Psychological Support & Stress Management

Flash floods demand calmness, but what happens after the crisis? Processing stress is key to long-term resilience, whether you’re a responder or an outdoors rescuer.

Claire Leverne

February 12, 2025

4 mins

RescueOps - Ep. 7: Rapid Assessment and Triage

What SREs can learn from avalanche rescue: speed, strategy, and coordination are everything when the clock is against you.

Claire Leverne

February 6, 2025

6 mins

From MTTR to SLOs: a shift towards proactive reliability

MTTR isn’t the silver bullet for reliability—it’s a trap. Learn why traditional incident metrics fall short, how SLOs provide a better approach, and how gamedays can help you test and improve system resilience.

Jacob Plicque III

February 5, 2025

6 mins

RescueOps - Ep. 6: Collaboration and Coordination Across Multiple Teams

Check out these red flags to watch for in both SAR and incident response when coordinating cross-functional teams.

Claire Leverne

January 29, 2025

9 mins

RescueOps - Ep. 5: Scalability and Flexibility

From hiking gear to SRE playbooks, scaling requires thoughtful preparation at every level. Learn why robust foundations, adaptable tools, and tested protocols are your best defense—whether facing a blizzard or a system outage.

Claire Leverne

January 22, 2025

7 mins

SRE Report 2025 - Key Takeaways

Missed the 58-page SRE Report 2025? I’ve summarized the essentials: growing demand for SLOs, rising toil levels, and why post-incident stress is higher than you might think. This quick-read will catch you up in no time.

JJ Tang

January 20, 2025

4 mins

RescueOps - Ep. 4: Situation Awareness and Real-Time Tracking

Whether scaling a mountain or troubleshooting an outage, situational awareness and real-time tracking can help your team build resilience and minimize costly delays.

Claire Leverne

January 9, 2025

7 mins

Google SREs are changing the game again: a breakdown of their new approach

Google SREs are redefining reliability practices with STAMP, addressing the limitations of traditional models as systems scale. Their approach highlights the need for system-wide hazard analysis.