Linkerd (rhymes with “chickadee”) is an open source service mesh designed to be deployed into a variety of container schedulers and frameworks such as Kubernetes. It became the original “service mesh” when its creator Buoyant first coined the term in 2016. Like Twitter’s Finagle, on which it was based, Linkerd was first written in Scala and designed to be deployed on a per-host basis. Criticisms of its comparatively large memory footprint led subsequently to the development of Conduit, a lightweight service mesh specifically for Kubernetes, written in Rust and Go. This project was since folded into Linkerd, which relaunched as Linkerd 2.0 in July of 2018. This post is about Linkerd 2 unless noted otherwise.
A “service mesh” is a dedicated infrastructure layer that makes service-to-service communication reliable, fast and safe. It is not a “mesh of services” but rather a mesh of API proxies that services call locally. The proxies then relay each call to their intended target service, thus abstracting away all aspects related to the network. By intercepting network communication within the application, service meshes are able to extract metrics (“telemetry”), apply service-to-service policies and encrypt the exchange.
Like all service meshes, Linkerd consists of a control plane and a data plane. The control plane is made up of a main controller component, a web component serving the user dashboard and a metrics component consisting of a modified Prometheus and Grafana. These components control the proxy configurations across the service mesh and process relevant metrics. The data plane consists of the interconnected Linkerd proxies themselves, which are typically deployed as “sidecars” into each service container.
Since the proxy components that make up Linkerd’s data plane are typically injected via command line, adding a service to Linkerd is pretty straightforward. You merely deploy the service and inject the proxy component as a sidecar. Once the service is equipped with a proxy component, it immediately becomes part of the service mesh. Linkerd’s proxy component supports the following features:
Linkerd’s control plane provides APIs and a user-facing web console, is responsible for proxy configuration, and collects and aggregates data plane metrics. Its three main components are:
Interestingly, because Linkerd’s control plane containers come with an instance of Linkerd’s proxy pre-installed, they are automatically part of the service mesh and can be controlled just like any other service in the mesh. The same is not the case for other service meshes.
Service meshes are a state-of-the-art solution to make service-to-service communication resilient, observable and secure across an entire application. They promise that, by pushing all network communication into a dedicated infrastructure layer, the network with all its intrinsic failure modes can be encapsulated away, thus freeing the developer from the arduous and error-prone task of handling them in their application code. And, as we’ve shown in “Should I Use a Service Mesh?” they are best positioned to deliver on this promise when used with stand-alone, moderately scaled microservice applications that run on Kubernetes.
What sets Linkerd apart is that it is known to have scaled to thousands of nodes in production environments, at least in its first version. Also, unlike, e.g., Istio, Linkerd can be deployed as a sidecar or on a per-host basis, making it a more flexible choice for environments that cannot or do not want to inject sidecars into containers.
Despite their promise to address essential and difficult-to-implement service-to-service communication concerns, service meshes are not as widely deployed as one would expect. This lack of uptake can be attributed in part to their relative novelty and the fact that the microservices space as a whole is rapidly evolving.
But service meshes are also not without criticisms. As we explain in “Should I Use a Service Mesh?” a dedicated infrastructure layer encapsulating all network communication is an invasive proposition and thus requires a uniform environment, a uniform granularity of design and a uniform technology stack to deliver on its promise. Service meshes are an opinionated, restricting and complex technology that is difficult to retrofit and thus is best applied to new, microservice-based applications that run on Kubernetes.
Service meshes have also been criticized for their at times poor performance and their lack of support for multi-cluster and multi-cloud topologies. Their impact on application performance, when compared to direct calls across the network, can be substantial and difficult to diagnose, let alone remediate. They require an up-front commitment to a platform that can be difficult to justify when the applications themselves are still evolving and are therefore ill-suited for service landscapes that consist of multiple, connected and organically growing applications.
In summary, service meshes are no magic bullet for architects and operators looking to run a growing portfolio of decomposed applications in an agile manner. They are tactical affairs that represent “below the ground” upgrades of technical issues that predominantly only interest developers and can prove out to not be a game changer for the business stakeholders or end-users.
As we pointed out in “Should I Use a Service Mesh?” service meshes like Linkerd are powerful technologies to “make service-to-service connections safe, fast and reliable.” They are also opinionated, complex and can have a significant effect on overall performance. Adopting one will therefore most likely succeed in uniform environments such as self-contained microservice architectures that run on Kubernetes.
Compared to other service meshes, Linkerd has the advantage that it can be deployed per host, as well as per container, which makes it a more flexible choice in mixed environments that run virtual machines as well as containers or where otherwise sidecar injection is not an option.
In any case, because Linkerd, like all service meshes, addresses the local concerns around service-to-service connectivity, not the large-scale and complex emergent behaviors of service landscapes, it may be deployed in addition to, and independent of, an enterprise-wide cloud traffic controller such as Glasnostic.