PagerDuty Pricing: Is It Worth the High Cost in 2024?
PagerDuty is known for its high costs, and this article breaks down what each tier offers in 2024, uncovering hidden fees and frequent upsells.
July 30, 2021
5 min read
Although the fundamental concepts of site reliability engineering are the same in any environment, SREs must adapt practices to different technologies, like microservices.
Most of the reliability engineering concepts that SREs learn can be applied to any type of application architecture or environment. That doesn’t mean, however, that reliability engineering methodologies should be app-agnostic. On the contrary, SREs should tailor their approach to the type of application they are supporting.
To prove the point, let’s discuss how managing reliability for a microservices-based app is different from working with a monolith.
Before jumping into the unique reliability challenges of microservices, it’s worth noting what doesn’t change about SRE work, regardless of the type of app you’re dealing with.
The fundamental principles that guide SREs are the same in almost any environment. For example, SLOs are important when managing virtually any service or application. So is the automation of SRE responsibilities and the use of techniques like severity levels to help manage incident response.
In this respect, the SRE role is different from many other types of technical roles. Developers tend to specialize in certain programming languages or architectural components (like frontends or backends). IT engineers may tailor their methodologies to the type of OS or cloud environment they have to support (the metrics that an IT operations team cares about when dealing with a Windows-based environment are probably different from those that matter in Kubernetes, for example). Security analysts may approach their work differently depending on the type of industry their business operates in because risks tend to vary between sectors, as do compliance rules.
But with SREs, fundamental concepts tend to be consistent across any type of environment. No one says “I’m a Windows SRE” or “I do SRE for mobile apps.” If you’re an SRE today, you’re expected to be able to do it all.
But again, that doesn’t mean that SREs can take the same approach to reliability engineering for any type of technology or architecture.
Case in point: Microservices applications. When you’re managing reliability for microservices, you face special challenges that don’t apply in the context of monoliths:
At the same time, microservices also require a special approach because they offer some inherent reliability advantages that monoliths lack. Above all, microservices apps are less prone to single points of failure. Even if one microservice fails or becomes slow to respond, the app as a whole may continue to function. In addition, it’s usually easier to fix and redeploy an individual microservice than it is an entire monolith.
Given the special traits of microservices, SREs should adjust their approach to microservices reliability in a few key ways.
For one, application-level metrics are arguably less important than they would be in other contexts. Instead of fixating on overall application response rates, error rates and duration, SREs should track metrics at the level of individual microservices. Of course, you’ll still want to make sure the application as a whole performs adequately, but it’s hard to fix performance issues if you lack visibility into the individual microservices that cause them.
Likewise, when setting SLOs for a microservices app, it often makes sense to establish SLOs on the basis of individual microservices -- or at least factor in your microservices architecture when devising SLOs. Think about which microservices within your app are the least reliable, and set SLOs based on them.
SREs must also take a more nuanced approach to monitoring and observing the host environment when working with microservices apps. With a monolith, you can usually get away with monitoring metrics and logs from just the host server’s OS. But with microservices, you need to track Kubernetes logs and metrics, as well as the OS-level metrics from each node in your cluster. And you have to correlate all of this with performance data from each microservice so that you can determine whether the root cause of an issue lies in the microservice, a Kubernetes service, a node or somewhere else.
A final difference between reliability engineering for monoliths and microservices, perhaps, is that with microservices, SREs can get away with taking more risks in production, given the fact that it is easier to redeploy (or roll back) a microservice than a monolith. That doesn’t mean that pre-deployment testing isn’t necessary when you’re working with microservices, of course. But in general, microservices make it easier to accept higher levels of risk than you could when dealing with more cumbersome monoliths.
In short, although the fundamental principles and concepts that undergird reliability engineering are the same in any context, SREs should adapt practices to the special requirements of whichever type of environment they are supporting. There are crucial differences between a monolith and a microservices app, and those differences should be reflected in the way SREs approach each type of environment.
{{subscribe-form}}