Organizations across all industry verticals are continuing to accelerate their adoption of microservices. This has led to a corresponding explosion in the use of containers and client/service communications. It has proven very challenging to manage these communications securely, at-scale and with observability. This has created increasing degrees of complexity and volatility within the enterprise. As a result, both operators and developers have a strong desire to encapsulate the complexities of the network and push them into a new network infrastructure layer. At the moment, the most popular approach to get a handle on these complexities is a service mesh.
Therefore, for this blog post, we are going to compare and contrast the feature sets of two popular service meshes, Linkerd and Istio. We’ll also explore the arguments for using a service mesh, and the scenarios where another type of solution might make more sense depending on your use case or architecture.
A service mesh is a dedicated infrastructure layer that makes service-to-service calls within a microservice architecture, reliable, fast and secure. It is a mesh of proxies that services can plug into to completely abstract the network away. In a nutshell, service meshes are designed to solve the many challenges developers face when talking to remote endpoints.
For an in-depth exploration about what a service mesh is (and isn’t), read our “What is a Service Mesh?” post.
Istio is an open source service mesh initially developed by Google, IBM and Lyft. The project was announced in May 2017, with its 1.0 version released in July 2018. Istio is built on top of the Envoy proxy which acts as its data plane. Although it is quite clearly the most popular service mesh available today, it is for all practical purposes only usable with Kubernetes.
Linkerd (rhymes with “chickadee”) is the original service mesh created by Buoyant, which coined the term in 2016. It is the official service mesh project supported by the Cloud-Native Computing Foundation, Like Twitter’s Finagle, on which it was based, Linkerd was originally written in Scala and designed to be deployed on a per-host basis. Criticisms of its comparatively large memory footprint subsequently led to the development of Conduit, a lightweight service mesh specifically for Kubernetes, written in Rust and Go. The Conduit project has since been folded into Linkerd, which relaunched as Linkerd 2.0 in July of 2018. While Linkerd 2.x is currently specific to Kubernetes, Linkerd 1.x can be deployed on a per-node basis, thus making it a more flexible choice where a variety of environments need to be supported. Unless mentioned otherwise, the comparisons below refer to Linkerd 2.x.
Both Istio and Linkerd support deployments via the popular sidecar pattern, which assigns each microservice a separate proxy. Instead of calling other services directly, microservices connect to their local proxy. This proxy then routes the call to the appropriate service instance’s proxy, which in turn passes the call on to its local microservice. This mesh of service proxies forms the data plane. In a service mesh, the data plane is configured and monitored by a control plane, which is typically deployed separately.
A control plane is a set of APIs and tools used to control proxy behavior across the mesh. The control plane is where users specify authentication policies, gather metrics and configure the data plane as a whole.
Istio’s control plane is made up of three components. First, Pilot is responsible for configuring the data plane. Next, Mixer collects traffic metrics and responds to various queries from the data plane such as authorization, access control and quota checks. Depending on which adapters are enabled, it can also interface with logging and monitoring systems. Finally, Citadel allows developers to build zero-trust environments based on service identity rather than network controls. It is responsible for assigning certificates to each service and can also accept external certificate authority keys when needed.
The control plane for Linkerd is made up of a controller component, a web component providing the administrative dashboard and a metrics component, which consists of modified versions of Prometheus and Grafana.
In a typical service mesh, service deployments are modified to include a dedicated sidecar proxy. Instead of calling services directly over the network, services call their local sidecar proxies, which in turn encapsulate the complexities of the service-to-service exchange. The interconnected set of proxies in a service mesh represent its data plane.
While Istio claims to support a variety of environments and frameworks, in practice, it is only well supported on Kubernetes, making it one of the narrower service mesh options.
Similarly, Linkerd 2.x currently also requires Kubernetes. However, Linkerd 1.x, which is still widely deployed and under active development, is designed to run in many environments and frameworks, including AWS ECS, DC/OS and Docker. The main reason for this broader support of environments is that Linkerd 1.x can be deployed per host, which allows it to integrate with environments that don’t lend themselves to sidecar deployments.
A disadvantage of the per-host deployment model is that a single proxy failure will affect multiple services. On the other hand, per-host deployments result in lower resource consumption compared to the sidecar model, an important consideration given the relatively high resource requirements of Linkerd 1.x.
Both Istio and Linkerd 2.x support HTTP 1.1, HTTP2, gRPC, and TCP communication between services via their sidecar proxies. Linkerd 1.x does not support TCP connections.
Both Istio (the control plane) and Linkerd 2.x are written in Go. The proxy used for Istio’s data plane, Envoy, is written in C++ while the proxy implementing the Linkerd 2.x data plane is written in Rust. Linkerd 1.x is written in Scala.
Istio’s control plane components provide the following security functionality:
- Citadel: Key and certificate management.
- Pilot: Distribution of authentication policies and secure naming information.
- Mixer: Management of authorization and auditing.
- Sidecars: Implementation of secure communication between proxies with support for TLS encryption.
As of this writing, automatic TLS encryption in Linkerd is labeled “experimental” and host-to-host authentication is not supported.
The process of adding sidecars to deployment artifacts and registering them with the service mesh control plane is called “sidecar injection.” Both Istio and Linkerd support manual and automatic sidecar injection.
Istio supports high availability on Kubernetes provided multiple replicas are configured and the
podAntiAffinity flag is used.
High availability features in Linkerd are currently labeled “experimental.”
Istio supports Prometheus natively and integrates with Jaeger for distributed tracing. Linkerd supports Prometheus and Grafana for monitoring out of the box but does not currently support distributed tracing.
The performance overhead of Linkerd 2.x is generally lower than that of Istio. In a performance benchmark between both services meshes, it was shown that for the test load consisting of HTTP echos, the queries per seconds dropped from a baseline of 30-35 thousand queries per second (kqps) to 10-12 kqps for Linkerd and 3.2-3.9 kqps for Istio.
There are five main reasons why you may not want to consider using a service mesh to manage the potentially complex networking challenges a microservices architecture presents.
Service meshes are a platform solution and thus tend to be very opinionated. This means you may find yourself having to “work their way” vs. the way that makes the most sense for your business or technology stack. This upfront investment may prove to be too costly, depending on your situation.
Similarly, if controlling how applications and services talk to each other is strategically important to your organization, using an existing service mesh makes little sense. Adopting a service mesh lets you benefit from a rising tide, but doesn’t allow you to control your destiny.
Deploying a service mesh adds considerable complexity. Deployments need to receive sidecars, the service mesh needs to be integrated into the environment and then continually reconfigured, and encryption may have to be redesigned. As a result, running a service mesh on a platform like Kubernetes for instance will require you to become not just an expert for the service mesh of your choice, but also the platform.
Routing traffic through a series of proxies can get painfully slow as the mesh grows and routing tables balloon in size.
Adopting a service mesh to trace requests across services is not always as valuable as it first appears. For example, if your microservices environment combines applications and services from different teams, it can be very challenging to interpret the traces when they cross the boundaries of different engineering teams and business units, let alone enterprises or cloud providers.
Service meshes are focused on tactical developer concerns about service-to-service calls. They do not help control the complex emergent behaviors that organically grow out of applications and services that interact with each other at scale and in unintended ways.
See our post “Should I use a service mesh?” for more details.
Enterprises continue to parallelize development teams for speed and agility, as a result of which microservice architectures transform into a sprawling landscape of services. As this landscape evolves, it becomes increasingly important that operators are able to control the complex emergent behaviors that result from the continuously changing interaction patterns.
Controlling these behaviors requires a cloud traffic controller such as Glasnostic. Unlike service meshes, which address the comparatively narrow developer concerns around service-to-service calls, Glasnostic is a solution to control the large-scale interaction behaviors of service landscapes so that enterprises can grow and evolve their product portfolio in a more rapid and agile manner.