Service Mesh (Why and When?)

This is a seven part series on service mesh. Starting with fundamentals, then hands-on with istio, resilience dynamic-routing and canary rollouts, API-Gateway, Security, obserability/tracing, and finally service-mesh at scale

This article briefly discusses the evolution of microservices, challenges of a microservice architecture, and how service mesh solves them.

Microservices vs Monolith

Taken from experfy

A monolith service architecture has a single service that is responsible for all features of the application, whereas decoupling these features logically and creating atomic services responsible for them creates the microservice architecture.

The argument in favor of microservices doesn't get any clearer than this:
Decentralization: Decentralized decision making, development, deployment, and testing. As compared to a monolith service, a microservice architecture enables different engineering teams to work simultaneously and makes the decision making of architecture, release, and testing locally.

Easier to scale: Due to its modular nature, a microservice architecture is easier to scale. Adding more developers, infrastructure is easy.

No Single Point of failure: Lastly, this point is true to only some extent. An architecture based on microservices, will not have a single point of failure. But in reality, most systems, do have services that have interdependencies. However, the effects can be mitigated if the system is designed with failure in mind, along with monitoring, fallback, and quick response in place.

Challenges of Microservices

A distributed system such as a microservice architecture comes with its own failure modes.

A fallacy is an incorrect assumption. Developers new to microservices can be guilty of these fallacies. Coined by L. Peter Deutsch as Fallacies of Distributed Computing.

The fallacies are: 
1. The network is reliable.
2. Latency is zero.
3. Bandwidth is infinite.
4. The network is secure.
5. Topology doesn't change.
6. There is one administrator.
7. Transport cost is zero.
8. The network is homogeneous.

If we analyze these fallacies, we either open ourselves to the following failures or have to make developers responsible for them:

  1. The responsibility of security is left on developers to establish protocols for communication between services.
  2. Error handling for the network is not coded in applications.
  3. Lack of a single authority.
  4. Bottlenecks created by a lack of rate-limiting and caching.

Hence, there arrives a need to decouple developments from operations.
What does that mean?

Application developers need to focus on business logic, whereas operations will be responsible for network communication between the distributed systems. i.e How the microservices will communicate. That is the solution that service mesh provides.

Service Mesh

A service mesh provides a transparent and language-independent way to flexibly and easily automate networking, security, and observation functions. In essence, it decouples development and operations for services.
Taken from InfoQ Magazine:Service Meshes

A generic service mesh has two parts. A control plane and a data plane.

The data plane is responsible for communications i.e All intra-service communications go via them. A service mesh creates a data plane by injecting a proxy container aside your services. Normally called a sidecar, its job is to intercept incoming and outgoing requests. These requests are then routed or denied depending upon the policy and security configurations. In general, these sidecars are responsible for communication between services, security, monitoring, and load-balancing.

The control plane, however, generates the configurations that are used by data planes. These configurations other than defaults depend mostly on user input. And hence the control plane creates a single administration of the network. Let's say for a use-case. We want to rate-limit the traffic from Service B to Service A. We will create a rate-limiting configuration. The control plane will disperse each sidecar, their configurations. The sidecar with Service B will intercept the traffic and rate limit it accordingly.

Lucky for us, there is a variety of service mesh available. Starts with Istio, consul, open-service-mesh and recent entrant Nginx service mesh.

Service Mesh Features

A service mesh like Istio will add the following capabilities to your microservice architecture.

Security:

Adds authentication, authorization, and encryption to communication between services with flexible strategy. Apart from intra-services communication, edge traffic i.e external traffic flowing inside the mesh (Ingress) and outward the mesh (egress) is also secured.

Systems Resilience:
Create systems resilient to overloaded capacity by adding circuit breaking to services. Or add global rate-limiting for cases when a large number of clients are accessing a small number of hosts. Like databases.

Dynamic Routing:

Programmable routing with rules that can URL, cookies, headers. Common use-cases are creating edge API-Gateway, API versioning. Add load-balancing with weightage, traffic shifting, red/blue deployments, canary deployments.

Full Stack Tracing and Observability:

The ability to visualize all the nodes, including each service, be it front end or back end. With edges showing traffic, observability is equally good for debugging traffic issues and a good knowledgebase for new developers in your team. With distributed tracing, you can debug the latency issues.

Distributed tracing enables users to track a request through mesh that is distributed across multiple services. This allows a deeper understanding about request latency, serialization and parallelism via visualization.

When do you need service-mesh

In deciding whether or not a service mesh makes sense for you and your organization, start by asking yourself two questions: how complex is your service topology and how will you integrate a service mesh into your software development lifecycle (SDLC)?

A service mesh won't add much value when you are starting and the topology is very simple. You can easily diagnose the bottlenecks when there are one or two services. Only when you start adding more services and increase the complexity of topology i.e (microservices that call each other and make two, three, or even four hops within a mesh to complete a request). Also, as we have already discussed, the service mesh data plane is as easy as installing the sidecars to your existing services. The deployment of service mesh will be easy, but you’ll add one step to your SDLC. That step will be rolling out occasionally new operations configurations when you roll out services.

Summary

Microservice architecture comes with its own failure modes. Decoupling development from operations can solve these problems and give a more robust SDLC. Service mesh provides a great way to handle these operations by providing security, resilience, authority, observability, and routing capabilities.

Follow on for getting hands-on with Google Istio Project.

Bikes, Tea, Sunset, IndieMusic in that order. Software Engineer who fell in love with cloud-native infrastructure.