.. _monitoring: Monitoring wire-server using Prometheus and Grafana ======================================================= Introduction ------------ The following instructions detail the installation of a monitoring system consisting of a Prometheus instance and corresponding Alert Manager in addition to a Grafana instance for viewing dashboards related to cluster and wire-services health. Prerequisites ------------- You need to have wire-server installed, see either of * :ref:`helm` * :ref:`helm_prod`. How to install Prometheus and Grafana on Kubernetes using Helm --------------------------------------------------------------- *Note: the following makes use of overrides for helm charts. You may wish to read :ref:`understand-helm-overrides` first.* Create an override file: .. code:: bash mkdir -p wire-server-metrics curl -sSL https://raw.githubusercontent.com/wireapp/wire-server-deploy/master/values/wire-server-metrics/demo-values.example.yaml > wire-server-metrics/values.yaml And edit this file by editing/uncommenting as needed with respect to the next sections. The monitoring system requires disk space if you wish to be resilient to pod failure. This disk space is given to pods by using a so-called "Storage Class". You have three options: * (1) If you deploy on a kubernetes cluster hosted on AWS you may install the ``aws-storage`` helm chart which provides configurations of Storage Classes for AWS's elastic block storage (EBS). For this, install the aws storage classes with ``helm upgrade --install aws-storage wire/aws-storage --wait``. * (2) If you're not using AWS, but you sill want to have persistent metrics, see :ref:`using-custom-storage-classes`. * (3) If you don't want persistence at all, see :ref:`using-no-storage-classes`. Once you have a storage class configured (or put the override configuration to not use persistence), next we can install the monitoring suite itself. There are a few known issues surrounding the ``prometheus-operator`` helm chart. You will likely have to install the Custom Resource Definitions manually before installing the ``wire-server-metrics`` chart: :: kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/d34d70de61fe8e23bb21f6948993c510496a0b31/example/prometheus-operator-crd/alertmanager.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/d34d70de61fe8e23bb21f6948993c510496a0b31/example/prometheus-operator-crd/prometheus.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/d34d70de61fe8e23bb21f6948993c510496a0b31/example/prometheus-operator-crd/prometheusrule.crd.yaml kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/d34d70de61fe8e23bb21f6948993c510496a0b31/example/prometheus-operator-crd/servicemonitor.crd.yaml Now we can install the metrics chart, run the following:: helm upgrade --install wire-server-metrics wire/wire-server-metrics --wait -f wire-server-metrics/values.yaml See the `Prometheus Operator README `__ for more information and troubleshooting help. Adding Dashboards ----------------- Grafana dashboard configurations are included as JSON inside the ``charts/wire-server-metrics/dashboards`` directory. You may import these via Grafana's web UI. See `Accessing grafana <#accessing-grafana>`__. Monitoring in a separate namespace ---------------------------------- It is advisable to separate your monitoring services from your application services. To accomplish this you may deploy ``wire-server-metrics`` into a separate namespace from ``wire-server``. Simply provide a different namespace to the ``helm upgrade --install`` calls with ``--namespace your-desired-namespace``. The wire-server-metrics chart will monitor all wire services across *all* namespaces. Accessing grafana ----------------- Forward a port from your localhost to the grafana service running in your cluster: :: kubectl port-forward service/-grafana 3000:80 -n Now you can access grafana at ``http://localhost:3000`` The username and password are stored in the ``grafana`` secret of your namespace By default this is: - username: ``admin`` - password: ``admin`` Accessing prometheus -------------------- Forward a port from your localhost to the prometheus service running in your cluster: :: kubectl port-forward service/-prometheus 9090:9090 -n Now you can access prometheus at ``http://localhost:9090`` Customization --------------- .. _using-no-storage-classes: Monitoring without persistent disk ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you wish to deploy monitoring without any persistent disk (not recommended) you may add the following overrides to your ``values.yaml`` file. .. code:: yaml # This configuration switches to use memory instead of disk for metrics services # NOTE: If the pods are killed you WILL lose all your metrics history prometheus-operator: grafana: persistence: enabled: false prometheusSpec: storageSpec: null alertmanager: alertmanagerSpec: storage: null .. _using-custom-storage-classes: Using Custom Storage Classes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you're using a provider other than AWS please reference the `Kubernetes documentation on storage classes `__ for configuring a storage class for your kubernetes cluster. If you wish to use a different storage class (for instance if you don't run on AWS) you may add the following overrides to your ``values.yaml`` file. .. code:: yaml prometheus-operator: grafana: persistence: storageClassName: "" prometheusSpec: storageSpec: volumeClaimTemplate: spec: storageClassName: "" alertmanager: alertmanagerSpec: storage: volumeClaimTemplate: spec: storageClassName: "" Troubleshooting --------------- "validation failed" ^^^^^^^^^^^^^^^^^^^^^ If you receive the following error: :: Error: validation failed: [unable to recognize "": no matches for kind "Alertmanager" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version Please run the script to install Custom Resource Definitions which is detailed in the installation instructions above. "object is being deleted" ^^^^^^^^^^^^^^^^^^^^^^^^^^ When upgrading you may see the following error: :: Error: object is being deleted: customresourcedefinitions.apiextensions.k8s.io "prometheusrules.monitoring.coreos.com" already exists Helm sometimes has trouble cleaning up or defining Custom Resource Definitions. Try manually deleting the resource definitions and trying your helm install again: :: kubectl delete customresourcedefinitions \ alertmanagers.monitoring.coreos.com \ prometheuses.monitoring.coreos.com \ servicemonitors.monitoring.coreos.com \ prometheusrules.monitoring.coreos.com