Get Rootly's Incident Communications Playbook

Don't let an incident catch you off guard - download our new Incident Comms Playbook for effective incident comms strategies!

By submitting this form, you agree to the Privacy Policy and Terms of Use and agree to sharing your information with Rootly and Google.

Back to Blog
Back to Blog

October 15, 2021

4 min read

What Managed Kubernetes Service is Best for SREs?

A comparison of EKS, AKS, GKE, Rancher and OpenShift from an SRE’s perspective.

Quentin Rousseau
Written by
Quentin Rousseau
What Managed Kubernetes Service is Best for SREs?
Table of contents

Kubernetes in general is a boon for SREs. By making it easy to manage microservices-based apps at scale, Kubernetes helps SRE teams achieve reliability goals for complex, cloud-native environments.

But if you know anything about Kubernetes, you know that there are a number of different Kubernetes distributions and services available, each with different strengths and weaknesses.

That begs the question: Which Kubernetes flavor is best for SREs? While there is no definitive answer, let’s explore five of the most popular Kubernetes services -- Amazon EKS, Azure AKS, Google Cloud GKE, SUSE Rancher and Red Hat OpenShift -- and examine how they stack up on the reliability engineering front.

Reliability pros and cons: EKS vs. AKS vs. GKE vs. Rancher vs. OpenShift

While a comprehensive overview of each of these Kubernetes flavors is behind the scope of this article, here’s a rundown of what each has to offer SREs in particular.

EKS

EKS, the managed Kubernetes service offered by the AWS Cloud, automates Kubernetes cluster setup. It also provides some tools to help manage Kubernetes itself, although the fact that Amazon markets it as a “managed service” doesn’t mean that customers don’t have to manage anything themselves.

From a reliability standpoint, one of the major advantages of EKS is that it’s available in a relatively wide selection of cloud regions. They include two GovCloud regions, which is ideal for businesses that need to meet tight compliance requirements in addition to steep reliability goals.

Amazon also supports hybrid environments based on EKS via both EKS Anywhere and Outposts. This is useful from a reliability perspective because it makes it possible to spread Kubernetes clusters across cloud-based and on-prem infrastructure, which helps to protect against outages in the event that one part of the infrastructure fails.

Finally, EKS offers an autoscaling feature, which can automatically add nodes when Kubernetes clusters start to run short on resources.

AKS

AKS is the managed Kubernetes service in Azure. It’s similar in most respects to EKS, including support for autoscaling. It offers a somewhat smaller selection of supported locations, but there is still a wide selection that covers most parts of the world.

AKS also doesn’t have a specific hybrid cloud variant comparable to EKS Anywhere, but it can be deployed in a hybrid architecture using Azure Stack, Azure’s hybrid framework.

In general, if your team already uses the Azure cloud, you can achieve similar reliability outcomes using AKS as you could in the AWS cloud with EKS.

GKE

GKE, Google Cloud’s answer to EKS and AKS, is similar to these two other services in most respects, including support for autoscaling.

Perhaps the most interesting reliability feature of GKE is its concept of different types of clusters, including single-zone clusters, multi-zone clusters and regional clusters. Each cluster type offers a different level of infrastructure redundancy, with regional clusters delivering the greatest level of reliability.

SREs may also appreciate GKE’s deployment options, which include both “autopilot” and “standard” mode. The latter gives teams more control over their clusters, while the former automates management to a higher degree. For SRE teams that want fine-tuned control, standard mode is the way to go.

Rancher

Rancher, which originated as a standalone container orchestration engine and is now owned by SUSE, differs from EKS, AKS and GKE in that it supports deployment in any public cloud, as well as on-premises, while still offering management features designed to make it easier to set up and administer Kubernetes.

The ability to let teams pick and choose where they want to deploy clusters is a reliability advantage. So is Rancher’s support for multi-cloud clusters, which maximize reliability by spreading Kubernetes environments across multiple clouds.

OpenShift

OpenShift, Red Hat/IBM’s Kubernetes-based PaaS, is similar to Rancher in that it can be deployed in any major public cloud. It’s also available as a managed service directly from Red Hat via OpenShift Online.

The choice of deployment options is a reliability advantage. So, arguably, is the fact that OpenShift is really more than just Kubernetes: It’s a complete PaaS, with “opinionated” configurations and a variety of built-in tools that extend beyond just container orchestration. In this respect, OpenShift helps manage reliability for the entire application delivery lifecycle, not just application production environments.

Conclusion

We’ve chosen to compare these five Kubernetes flavors because they command the greatest market share and mindshare, but the list certainly goes on. From Platform9 to Rafay to Anthos (which is based partly on GKE, but is more than that) and beyond, SREs have a variety of Kubernetes distributions to choose from. And while developer and IT engineer preferences may play a leading role in dictating which Kubernetes flavor an organization chooses, SREs should understand the reliability strengths of the various Kubernetes services so that they, too, can participate in the discussion of which Kubernetes flavor to adopt.

{{subscribe-form}}