How Many SREs Does Your Company Need? Here’s How to Decide
Tips for deciding how many SREs your company should hire.
December 20, 2024
6 mins
Communication can be a lifesaver—whether on a snow-covered volcano or during a system outage. This post shares how lessons from Search and Rescue operations can enhance incident response in tech, ensuring that teamwork and trust keep chaos at bay.
The first post in this series introduced the structural parallels between search and rescue ICS operations, and tech ops.
Coordinating a complex rescue mission in challenging environments—or managing unexpected outages in a distributed infrastructure—is only possible with effective communication. The ability to communicate clearly and efficiently can mean the difference between success and failure.
In this post, I’ll share why communication is critical in Search and Rescue (SAR) operations and how those best practices have proven invaluable in incident management throughout my career.
One of my core backcountry memories is my first ascent of Mount Adams. Adams, one of Washington’s volcanic peaks, is covered in snow year-round and is a great introductory climb for an aspiring mountaineer—meaning, if one starts their climb completely unprepared, they might just walk back down the mountain, their only trauma being some blisters and a change of heart. In my case, it was my first experience with how badly an expedition can go awry when there is no leader, no plan, and no communication channels established.
At dawn, we were all optimists. I was kitted out with gaiters, crampons, and an ice axe—boy, did I feel like the real deal. Step aside, Merriwether Lewis. Our ragtag group of hikers had been united by a mutual enthusiasm for snagging an epic summit, most of us having only met the night before.
By midday, our gaps in physical stamina started a process of natural selection, widening the literal gaps between us on the snowy slope. By 1 PM, many of my teammates were specks on the mountainside. By 2 PM, it became apparent that several of us were struggling with altitude and needed to descend. So began a four-hour fiasco of trying to communicate a new plan of action, collect team members scattered across a 12,000-foot volcano, and race against the setting sun—all while operating at a heightened risk of getting lost, suffering from altitude sickness, dehydration, or hypothermia. By some miracle, every person was accounted for when we returned to the parking lot after dark, our feet full of blisters, and that change of heart right on schedule.
Much of the structure present in SAR operations is meant to avoid the exact conditions we experienced on the Mount Adams trip: we were not a team, we underestimated the enormity of the task, and we lacked both the communication channels and protocols needed to adapt to a dynamic environment. Effective communication is the lifeline of SAR missions, and communication habits are built through endless hours of training. Good communication is the product of good training, but the focus of good training is even more fundamental: teamwork.
SAR teams are typically volunteer-based, so when a call comes through, it can be somewhat random who has the availability to respond. However, core to every mission is the structure of role assignment. Before any team members have responded, there is already a roster of predefined roles to fill, ordered by their criticality (Team Leader, Medic, Comms, Navigation, etc.). As each member responds, their name is assigned to the highest criticality, open role that they are qualified to fill.
Similarly, technical incident triage begins with a structured approach. Incidents are automatically routed to the appropriate individuals based on specific needs and skill sets. Prior to any incident occurring, there is a roster of roles with assigned on-call personnel ready to receive alerts. Implementing redundancy and backups fosters system resilience, ensuring that critical functions continue uninterrupted even in high-stress or unpredictable environments. This structured approach ensures that all aspects of an incident are managed efficiently, minimizing downtime and maintaining system reliability.
A team is more than a collection of qualified members; a team needs shared experience, culture, and above all—trust. Trust within a SAR team is built through hours of training, meetings, missions, and even arguing over bowline knots. Through these shared experiences of effort and responsibility, successful members build an ethic of ownership for their actions and a sense of responsibility for the safety of their crew. This ethic is tangible in a high-functioning team and facilitates open and honest communication. When you care about your mission’s success and see failure as a reflection of your own decisions, you invest in communicating at every bump in the road. When your team trusts that you’re a competent member with their best interests at heart, they listen. Each member feels confident in the other's abilities and intentions.
Regular briefings ensure everyone is aligned, contributing to a cohesive and trustworthy team dynamic essential for navigating the uncertainties of rescue missions. One of the most important tenets of trust (and continuous improvement) is the ability to admit fault, so debriefings at the end of each mission or training session play a critical role in team culture. They create space to reassess decisions and discuss what went well and what didn’t.
During an incident, clear and honest communication about the status, potential impacts, and ongoing efforts to resolve the issue helps build trust among team members and stakeholders. Post-incident reviews are conducted openly to discuss what went wrong and how to improve, ensuring that the team learns and grows from each event. This culture of transparency enhances collaboration and confidence, enabling everyone to work together effectively to mitigate future incidents.
The end goal of assembling a team and building trust is seamless execution, even in the face of a high-stakes, high-stress scenario. Each member is able to contribute meaningfully and efficiently in their assigned role, but the link between each contributing member is communication. Communication transforms effective individuals into an effective team.
When teams operate in isolation, communication becomes fragmented. Establishing cross-functional communication channels (or a unified command space) helps break down these silos, ensuring cohesive incident management.
Sometimes, in an effort to be thorough, teams overcomplicate communication. Clear, concise messages are more effective, especially during high-pressure situations.
Failing to communicate lessons learned after an incident can hinder growth and improvement. Structured post-incident reviews ensure valuable knowledge is captured and shared.
Effective communication is the lifeblood of both search and rescue operations and tech incident response. In tech incident response, staying flexible and adaptable is crucial for managing unexpected outages or system failures. Teams must be able to assess the situation in real time, adjust their strategies based on new data, and implement solutions quickly to restore services. None of this is possible without effective communication at every step, and effective communication is built on trust.
Some practical adventuring advice: Know your crew. Trust your crew. Never separate without a sound rendezvous plan. Even if you don’t plan to separate, have a plan for the unexpected—because unexpected things have a nasty habit of happening. Control is checked at the door when we step outside our cozy homes, and adaptability is our best safeguard.
Make good choices, and remember to pack snacks!