Kubernetes: Monitoring Resources

^{2017-07-09

kubernetes

monitoring

cadvisor

prometheus

influxdb

elasticsearch

grafana

modified: 2021-01-04

reading: 5 minutes}

You know that monitoring is a hot topic this year when you see such variety of ways how you can monitor kubernetes cluster.

If you are using Splunk - we provide solution to monitor and collect logs from Kubernetes. Please take a look on Monitoring Kubernetes.

cAdvisor

It all starts from cAdvisor. Kubelet is built with cAdvisor. You can find the source code responsible for /stats endpoint under release-1.7:pkg/kubelet/server/stats.

With kubectl proxy you can get access to this endpoint by

kubectl proxy &
curl localhost:8001/api/v1/nodes/$(kubectl get nodes -o=jsonpath="{.items[0].metadata.name}")/proxy/stats/

Web Interface and API of cAdvisor can be enabled with --cadvisor-port argument (see kubelet).

For example if you are using minikube you can start it with

minikube start --extra-config=kubelet.CAdvisorPort=4194

To get access to the cAdvisor Web UI just use

open http://$(minikube ip):4194

If you are using kubeadm, you need to modify

sudo vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
sudo systemctl daemon-reload
sudo systemctl restart kubelet.service

And set the --cadvisor-port=4194. See Managing the kubeadm drop-in file for the kubelet for more details.

cadvisor

Kubelet

Kuberlet’s /metrics endpoint has a lot of information, including metrics for etcd, go metrics, provided by prometheus. And yes, the format of the returned data is defined by prometheus. It should include cadvisor metrics as well (see below)

kubectl proxy &
curl localhost:8001/metrics

You should be able to see all cAdvisor specific metrics, including container resources.

But In kubernetes 1.7 this behavior was broken. See After upgrading to 1.7.0, Kubelet no longer reports cAdvisor stats.

Heapster

Heapster is built as independent process. It can pull metrics from kubernetes cluster, store them in memory (limited amount) and forward to one or more of the sinks, including logs.

heapster

You can find what metrics you can expect from heapster on this link: Metrics.

Considering that Heapster is the project hosted under kubernetes - I would assume that this will be the most stable option with combination of one of the @kubernetes/heapster-maintainers supported sink.

Kubernetes Dashboard

Kubernetes Dashboard is the general-purpose web UI for Kubernetes clusters.

If you will have heapster deployed on your kubernetes cluster then you will be able to see simple resource usage on Dashboard

kubernetes-dashboard

Prometheus

Prometheus is the open source monitoring system. It is part of Cloud Native Computing Foundation.

Prometheus pulls metrics directly from /metrics endpoint. It does not depend on heapster. To set it up you need to use kubernetes scrape config.

Because in kubernetes 1.7 /metrics endpoint does not provide metrics reported by cAdvisor, you need to be sure to use the latest scrape config (currently in PR) see Kubernetes 1.7.0 requires cAdvisor changes. And to make this scrape config to work you need to enable port 4194 for cAdvisor (see above).

Prometheus gives you a graph ui, which is only useful for debugging. You can also build dashboards with console templates.

prometheus

The benefit of using prometheus - you will not only get resource usage, but also internal kubernetes metrics.

Grafana

Grafana is the open source user interface for a time series data.

Grafana allows you to use a lot of various data sources, including prometheus, ElasticSearch and InfluxDB. It is certainly the most advanced open source interface for time series analysis.

I have built my dashboard on top of Kubernetes cluster monitoring (via Prometheus) by Instrumentisto Team, as a Data Source I have used prometheus.

grafana

Elastic Stack

Elasticsearch is a well known database built for full text search. It is getting more and more use cases.

You have several options how you can get metrics in Elasticsearch. First is to use Heapster and set ElasticSearch as a sink. Current implementation of this sink has some issues, meaning it will use more storage and it is a little bit tricky to actually query the metrics, I would not use it in production. But you still can built some dashboards with Kibana.

kibana

Another option is to use metricbeat with kubernetes module. It is currently in alpha, so I have not tried it yet.

Third option is to use prometheus module and collect metrics directly from kubernetes cluster. That could be tricky, as you probably want to dynamically generate few metrics endpoints specific for nodes.

InfluxData

InfluxDB is the Time Series Database. InfluxDB also has several options how you can get metrics in it.

One option is to use Heapster. Because InfluxDB sink is owner by @kubernetes/heapster-maintainers, I would assume that this will be the most tested, stable and less broken option. So for production and long term support I would probably suggest to use this one.

Another option is to use telegraf with kubernetes input plugin. Telegraf also provides some built in dashboards for Chronograf, when you use specific inputs.

Chronograf is Web Interface provided by InfluxData. Very early stage, dashboards aren’t configurable as much as in Grafana. Data explorer and alerting (need kapacitor) can be useful

chronograf

From my opinion chronograf is on very early stage and it feels like that Grafana is the better option.

Summary

As you can see there are a lot of options how you can monitor Kubernets resources. And I am for sure missing a lot of other options, including DataDog. I am still debating on which one to use.

I like prometheus with grafana, because that gives me the most set of metrics. And it feels like that prometheus format is the standard now, so more and more applications provides metrics in that format. So it is good to keep prometheus server around, and looks like with version 2 they will finally get long term storage for metrics.

I also like elasticsearch with kibana, just because I use it for other data as well. Current sink implementation is very hard to manage. I will give a try new kubernetes module in metricbeat at some point, but currently I am not sure if it actually supports elasticsearch/kibana 5.x or I need to have whole stack with version 6.

Heapster with InfluxDB and Grafana is the third choice. That combination feels like most reliable and supported. But because of the prometheus format supported everywhere I would probably stay with prometheus.

@outcoldman