Monitoring wire-server using Prometheus and Grafana

Introduction

The following instructions detail the installation of a monitoring system consisting of a Prometheus instance and corresponding Alert Manager in addition to a Grafana instance for viewing dashboards related to cluster and wire-services health.

Prerequisites

You need to have wire-server installed, see either of

How to install Prometheus and Grafana on Kubernetes using Helm

Note

The following makes use of overrides for helm charts. You may wish to read Overriding helm configuration settings first.

Create an override file:

mkdir -p wire-server-metrics
curl -sSL https://raw.githubusercontent.com/wireapp/wire-server-deploy/master/values/wire-server-metrics/demo-values.example.yaml > wire-server-metrics/values.yaml

And edit this file by editing/uncommenting as needed with respect to the next sections.

The monitoring system requires disk space if you wish to be resilient to pod failure. This disk space is given to pods by using a so-called “Storage Class”. You have three options:

    1. If you deploy on a kubernetes cluster hosted on AWS you may install the aws-storage helm chart which provides configurations of Storage Classes for AWS’s elastic block storage (EBS). For this, install the aws storage classes with helm upgrade --install aws-storage wire/aws-storage --wait.

    1. If you’re not using AWS, but you sill want to have persistent metrics, see Using Custom Storage Classes.

    1. If you don’t want persistence at all, see Monitoring without persistent disk.

Once you have a storage class configured (or put the override configuration to not use persistence), next we can install the monitoring suite itself.

There are a few known issues surrounding the prometheus-operator helm chart.

You will likely have to install the Custom Resource Definitions manually before installing the wire-server-metrics chart:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/d34d70de61fe8e23bb21f6948993c510496a0b31/example/prometheus-operator-crd/alertmanager.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/d34d70de61fe8e23bb21f6948993c510496a0b31/example/prometheus-operator-crd/prometheus.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/d34d70de61fe8e23bb21f6948993c510496a0b31/example/prometheus-operator-crd/prometheusrule.crd.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/d34d70de61fe8e23bb21f6948993c510496a0b31/example/prometheus-operator-crd/servicemonitor.crd.yaml

Now we can install the metrics chart, run the following:

helm upgrade --install wire-server-metrics wire/wire-server-metrics --wait -f wire-server-metrics/values.yaml

See the Prometheus Operator README for more information and troubleshooting help.

Adding Dashboards

Grafana dashboard configurations are included as JSON inside the charts/wire-server-metrics/dashboards directory. You may import these via Grafana’s web UI. See Accessing grafana.

Monitoring in a separate namespace

It is advisable to separate your monitoring services from your application services. To accomplish this you may deploy wire-server-metrics into a separate namespace from wire-server. Simply provide a different namespace to the helm upgrade --install calls with --namespace your-desired-namespace.

The wire-server-metrics chart will monitor all wire services across all namespaces.

Accessing grafana

Forward a port from your localhost to the grafana service running in your cluster:

kubectl port-forward service/<release-name>-grafana 3000:80 -n <namespace>

Now you can access grafana at http://localhost:3000

The username and password are stored in the grafana secret of your namespace

By default this is:

  • username: admin

  • password: admin

Accessing prometheus

Forward a port from your localhost to the prometheus service running in your cluster:

kubectl port-forward service/<release-name>-prometheus 9090:9090 -n <namespace>

Now you can access prometheus at http://localhost:9090

Customization

Monitoring without persistent disk

If you wish to deploy monitoring without any persistent disk (not recommended) you may add the following overrides to your values.yaml file.

# This configuration switches to use memory instead of disk for metrics services
# NOTE: If the pods are killed you WILL lose all your metrics history
prometheus-operator:
  grafana:
    persistence:
      enabled: false
  prometheus:
    prometheusSpec:
      storageSpec: null
  alertmanager:
    alertmanagerSpec:
        storage: null

Using Custom Storage Classes

If you’re using a provider other than AWS please reference the Kubernetes documentation on storage classes for configuring a storage class for your kubernetes cluster.

If you wish to use a different storage class (for instance if you don’t run on AWS) you may add the following overrides to your values.yaml file.

prometheus-operator:
  grafana:
    persistence:
      storageClassName: "<my-storage-class>"
  prometheus:
    prometheusSpec:
      storageSpec:
        volumeClaimTemplate:
          spec:
            storageClassName: "<my-storage-class>"
  alertmanager:
    alertmanagerSpec:
      storage:
        volumeClaimTemplate:
          spec:
            storageClassName: "<my-storage-class>"

Troubleshooting

“validation failed”

If you receive the following error:

Error: validation failed: [unable to recognize "": no matches for kind "Alertmanager" in version
"monitoring.coreos.com/v1", unable to recognize "": no matches for kind "Prometheus" in version
"monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version

Please run the script to install Custom Resource Definitions which is detailed in the installation instructions above.

“object is being deleted”

When upgrading you may see the following error:

Error: object is being deleted: customresourcedefinitions.apiextensions.k8s.io "prometheusrules.monitoring.coreos.com" already exists

Helm sometimes has trouble cleaning up or defining Custom Resource Definitions. Try manually deleting the resource definitions and trying your helm install again:

kubectl delete customresourcedefinitions \
  alertmanagers.monitoring.coreos.com \
  prometheuses.monitoring.coreos.com \
  servicemonitors.monitoring.coreos.com \
  prometheusrules.monitoring.coreos.com