Service Mesh (Resilience, Dynamic Routing, Load-Balancing)(Part 3)

This is a seven part series on service mesh. Starting with fundamentals, then hands-on with istio, resilience, dynamic-routing and load-balancing, API-Gateway, Security, obserability/tracing, and finally service-mesh at scale.

We will be using the same example that we created in hands-on.
A Kubernetes cluster having istio installation and bookinfo example as our microservices. If you have missed it, I would suggest you go through the previous part where we get hands-on with Istio on GKE.

We will be using the following tools/concepts that Istio defines throughout the scope of this article.

Destination Rule: DestinationRule defines policies that apply to traffic intended for service after routing has occurred.

Virtual Service: A VirtualService defines a set of traffic routing rules to apply when a host is addressed.

Service Entries: ServiceEntry enables adding additional entries into Istio’s internal service registry, so that auto-discovered services in the mesh can access/route to these manually specified services.

Since we have our service mesh in place. We want it to provide us with the following:

  1. Resilience.
  2. Dynamic-Routing.
  3. Load-balancing and Canary Rollout.

Resilience:

And what do we mean by resilience? That means in case of a failure, we need pre-programmed behaviors to mitigate the effects of those failures.

We don't want our services to hang and wait indefinitely when the request is taking too long. It's not recommended to use the default timeout value because the actual time for operation only depends on the type of operation. We can dynamically create a timeout value in our mesh per service. Here is how we can add a timeout of 10s.

Kubectl apply -f - <<EOF 
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ratings
spec:
hosts:
- ratings
http:
- route:
- destination:
host: ratings
subset: v1
timeout: 10s
EOF

We can have transient noise in our mesh, for example, deployments or restarts can make a service unavailable temporarily. It's highly likely the request to an such an upstream host fails, because of that. But since that transient noise is temporary our system should have retries built in to mitigate that. We can add request retries in our services so in situations like this the original request to service does not end in failure.

We create a virtual service for host ratings, the destination will be original but we add a retries spec with 3 attempts and 2s of timeout configuration.

Kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ratings
spec:
hosts:
- ratings
http:
- route:
- destination:
host: ratings
subset: v1
retries:
attempts: 3
perTryTimeout: 2s
EOF

Circuit Breaking

For services that receive requests from too many hosts, can be overloaded. This overload can result in timeout or denial of service. We can create a connection pool for such scenarios and even add a circuit breaking limit. When the limit is reached, no new connection is accepted.
We create a destination rule for the host and define a connection pool in trafficPolicy. The connection pool is applied to all TCP upstream requests to reviews service.

Kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
subsets:
- name: v1
labels:
version: v1
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
EOF

Dynamic Routing

Using Istio’s destination rules and virtual service we can program the incoming traffic route based on their URL, origin, cookies, region, and headers. This can be applicable in many scenarios. For example, you want to redirect regional requests to closest clusters or if you are hosting multiple versions of API then you can redirect request based on URL usually as v1 or v2. One other example that comes to my mind, is when we hosted a dashboard in each of our clusters. But we wanted to access all of them from the same URL. So we created a virtual service in one of our cluster, that receives requests and routed it to other clusters based on URL.

Kubctl apply -f - <<EOF 
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: tracing-dashboard
namespace: istio-system
spec:
hosts:
- <Your host here>
gateways:
- tracing-gateway
http:
- name: "my-first-cluster"
match:
- uri:
prefix: /my-first-cluster
route:
- destination:
host: <Path to dashboard in my-first-cluster>
- name: "my-second-cluster"
match:
- uri:
prefix: /my-second-cluster
route:
- destination:
host: <Path to dashboard in my-second-cluster>
EOF

Using virtual service, we can create programmable behavior for incoming, outgoing, and internal traffic as well.

To control routing for traffic bound to services outside the mesh, external services must first be added to Istio’s internal service registry using the ServiceEntry resource. VirtualServices can then be defined to control traffic bound to these external services.

kubectl apply -f - <<EOF 
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: external-svc-wikipedia
spec:
hosts:
- wikipedia.org
location: MESH_EXTERNAL
ports:
- number: 80
name: example-http
protocol: HTTP
resolution: DNS
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: my-wiki-rule
spec:
hosts:
- wikipedia.org
http:
- timeout: 5s
route:
- destination:
host: wikipedia.org
EOF

Load-balancing and Canary Rollout

Each service can have multiple instances of itself. How will we distribute requests so that no single instance is overload? Or we can have services with multiple versions or iterations running.
So to load-balance a service toward the instance with the least connections. We can use the following destination rule.

kubectl apply -f - <<EOF  
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: bookinfo-ratings
spec:
host: ratings.prod.svc.cluster.local
trafficPolicy:
loadBalancer:
simple: LEAST_CONN
EOF
# You can replace LEAST_CONN with ROUND_ROBIN. More Info

Some organizations use canary rollouts. I.e when a new version of service is released, it is first waited to stabilize, then a small portion of traffic is shifted toward and monitored. When all checks pass, then gradually all traffic is shifted.

For example, we have three versions of reviews service in our bookinfo example. Here is how we can use Istio virtual service and destination rule to achieve that.

kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
weight: 90
- destination:
host: reviews
subset: v2
weight: 10
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: helloworld
spec:
host: helloworld
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
EOF

Here our destination rules create service version aka subsets using labels on pod. And our virtual services routes the traffic with weights 90% to v1 and 10% to v2. We can slowly increase the weighted traffic to v2 once the service stabilizes and complete the cycle of canary rollout.

Summary

Great!! We have now seen how an Istio can help us with operations that add resilience, dynamic routing, load balancing, and canary rollouts in our mesh.
Next Up, we cover API-Gateway and Edge in our service mesh.

Bikes, Tea, Sunset, IndieMusic in that order. Software Engineer who fell in love with cloud-native infrastructure.