Monitoring in Kubernetes (Multi-Cluster environment)

This is the last article in three-part series on monitoring. If you haven't read the what, how and why of monitoring, and monitoring in kubernetes (Hands-On). I would suggest giving them a read first.

We have a Kubernetes based multi-cluster infrastructure. They can be any region, environment (Stage, Production). They have cross-cluster communication established via any service mesh.

In a multi-cluster environment, the best way to monitor or control our infrastructure is to establish a central command and control. Where we have a central cluster that has tooling installed to monitor our other clusters.

We have earlier described monitoring as exporting, collecting, and visualizing. Prometheus provides the facility of Prometheus federation, wherein each Prometheus instance becomes an exporter, and the central Prometheus scrapes the Prometheus instances residing in each cluster.
So we install our monitoring tooling in monitoring namespace as we did in the hands-on, in all of our clusters.

To begin, install monitoring tooling in other clusters as described in hands-on.

Similarly, install grafana in our central cluster, but we will use a different helm value file for prometheus-central. The difference in central Prometheus is that we create scrape jobs to the Prometheus instances in our product clusters.

As described in the code snippet below. Notice, the labels field. This will help us in grafana to select metrics for each cluster via drop-down. We call them dashboard variables. Use this gist to get helm-values for central-prometheus.

extraScrapeConfigs: |  
# Prometheus Federation. Scrape metrics from prometheus cross-cluster
- job_name: 'my-first-cluster'
scrape_timeout: 30s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="prometheus"}'
- '{job="prometheus-pushgateway"}'
- '{job="prometheus-blackbox-exporter"}'
- '{job="kubernetes-apiservers"}'
- '{job="kubernetes-nodes"}'
- '{job="kubernetes-nodes-cadvisor"}'
- '{job="kubernetes-pods"}'
- '{job="kubernetes-service-endpoints"}'
- '{job="kubernetes-nodes-cadvisor"}'
- '{job="kubernetes-pods"}'
- '{__name__=~"job:.*"}'
static_configs:
- targets:
- '<Link to prometheus in my-first-cluster>'
labels:
cluster: my-first-cluster

Installation Steps

# Create monitoring namespace
kubectl create ns monitoring
# Install grafana
helm install stable/grafana --name grafana --namespace monitoring -f grafana.yaml
## Import the kubernetes cost and microservice health dashboards as described in hands-on
# Install Prometheus
helm install stable/prometheus --name my-first-cluster-prometheus --namespace monitoring -f prometheus-central.yaml

We will add, drop-down selector for our product clusters, such that selecting each option will populate the dashboard with metrics from that respective cluster.

Steps:

  1. Goto dashboard Settings -> Variables. Click New.

2. Enter Name : “cluster”, Label : “cluster”, Data source : “Prometheus”, Query : label_values(cluster), set Multi-value to true.
Here we enter Query as a function of label_values. Where label_values gets unique values from all our metrics. That is (my-first-cluster, my-second-cluster).

3. Move the precedence of cluster variable to top.

And finally, return to the dashboard and use the drop-down to select metrics from the respective cluster.

Summary

In a multi-cluster environment, there should be central monitoring and configuration cluster, that scrapes the metrics from all other clusters into one place.

Bikes, Tea, Sunset, IndieMusic in that order. Software Engineer who fell in love with cloud-native infrastructure.