Etcd
--------------------------
.. include:: includes/intro.rst
This section only covers the bare minimum, for more information, see the `etcd documentation `__
How to see cluster health
~~~~~~~~~~~~~~~~~~~~~~~~~~
If the file `/usr/local/bin/etcd-health.sh` is available, you can run
.. code:: sh
etcd-health.sh
which should produce an output similar to::
Cluster-Endpoints: https://127.0.0.1:2379
cURL Command: curl -X GET https://127.0.0.1:2379/v2/members
member 7c37f7dc10558fae is healthy: got healthy result from https://10.10.1.11:2379
member cca4e6f315097b3b is healthy: got healthy result from https://10.10.1.10:2379
member e767162297c84b1e is healthy: got healthy result from https://10.10.1.12:2379
cluster is healthy
If that helper file is not available, create it with the following contents:
.. code:: bash
#!/usr/bin/env bash
HOST=$(hostname)
etcdctl --endpoints https://127.0.0.1:2379 --ca-file=/etc/ssl/etcd/ssl/ca.pem --cert-file=/etc/ssl/etcd/ssl/member-$HOST.pem --key-file=/etc/ssl/etcd/ssl/member-$HOST-key.pem --debug cluster-health
and then make it executable: ``chmod +x /usr/local/bin/etcd-health.sh``
How to inspect tables and data manually
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: sh
TODO
.. _how-to-rolling-restart-an-etcd-cluster:
How to rolling-restart an etcd cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Etcd is a consistent and partition tolerant key-value store. This means that
Etcd nodes can be restarted (one by one) with no impact to the consistency of
data, but there might a small time in which the database can not process
writes. Etcd has a designated leader which decides ordering of events (and thus
writes) in the cluster. When the leader crashes, a leadership election takes
place. During the leadership election, the cluster might be briefly
unavailable for writes. Writes during this period are queued up until a new
leader is elected. Any writes that were happening during the crash of the
leader that were not acknowledged by the leader and the followers yet will be
'lost'. The client that performed this write will experience this as a write
timeout. (Source: https://etcd.io/docs/v3.4.0/op-guide/failures/). Client
applications (like kubernetes) are expected to deal with this failure scenario
gracefully.
Etcd can be restarted in a rolling fashion, by cleanly shutting down and
starting up etcd servers one by one. In Etcd 3.1 and up, when the leader is
cleanly shut down, it will hand over leadership gracefully to another node,
which will minimize the impact of write-availability as election time is
reduced. (Source :
https://kubernetes.io/blog/2018/12/11/etcd-current-status-and-future-roadmap/)
Restarting follower nodes has no impact to availability.
Etcd does load-balancing between servrvers on the client-side. This means that
if a server you were talking to is being restarted, etcd will transparently
redirect the request to another server. It's is thus safe to shut them down at
any point.
Now to perform a rolling restart of the cluster, do the following steps:
1. Check your cluster is healthy (see above)
2. Stop the process with ``systemctl stop etcd`` (this should be safe since etcd clients retry their operation if one endpoint becomes unavailable, see `this page `__)
3. Do any operation you need, if any. Like rebooting
4. ``systemctl start etcd``
5. Wait for your cluster to be healthy again.
6. Do the same on the next server.
*For more details please refer to the official documentation:* `Replacing a failed etcd member `__
.. _etcd_backup-and-restore:
Backing up and restoring
~~~~~~~~~~~~~~~~~~~~~~~~~
Though as long as quorum is maintained in etcd there will be no dataloss, it is
still good to prepare for the worst. If a disaster takes out too many nodes, then
you might have to restore from an old backup.
Luckily, etcd can take periodic snapshots of your cluster and these can be used
in cases of disaster recovery. Information about how to do snapshots and
restores can be found here:
https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/recovery.md
*For more details please refer to the official documentation:* `Backing up an etcd cluster `__
Troubleshooting
~~~~~~~~~~~~~~~~~~~~~~~~~~
How to recover from a single unhealthy etcd node after virtual machine snapshot restore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
After restoring an etcd machine from an earlier snapshot of the machine disk, etcd members may become unable to join.
Symptoms: That etcd process is unable to start and crashes, and other etcd nodes can't reach it::
failed to check the health of member e767162297c84b1e on https://10.10.1.12:2379: Get https://10.10.1.12:2379/health: dial tcp 10.10.1.12:2379: getsockopt: connection refused
member e767162297c84b1e is unreachable: [https://10.10.1.12:2379] are all unreachable
Logs from the crashing etcd::
(...)
Sep 25 09:27:05 node2 etcd[20288]: 2019-09-25 07:27:05.691409 I | raft: e767162297c84b1e [term: 28] received a MsgHeartbeat message with higher term from cca4e6f315097b3b [term: 30]
Sep 25 09:27:05 node2 etcd[20288]: 2019-09-25 07:27:05.691620 I | raft: e767162297c84b1e became follower at term 30
Sep 25 09:27:05 node2 etcd[20288]: 2019-09-25 07:27:05.692423 C | raft: tocommit(16152654) is out of range [lastIndex(16061986)]. Was the raft log corrupted, truncated, or lost?
Sep 25 09:27:05 node2 etcd[20288]: panic: tocommit(16152654) is out of range [lastIndex(16061986)]. Was the raft log corrupted, truncated, or lost?
Sep 25 09:27:05 node2 etcd[20288]: goroutine 90 [running]:
(...)
Etcd will refuse nodes that run behind to join the cluster. If a node has
committed to a certain version of the raft log, it is expected not to jump back
in time after that. In this scenario, we turned an etcd server off, made a
snapshot of the virtual machine, brought it back online, and then restored the
snapshot. What went wrong is is that if you bring up a VM snapshot, it means
the etcd node will now have an older raft log than it had before; even though
it already gossiped to all other nodes that it has knowledge of newer entries.
As a safety precaution, the other nodes will reject the node that is travelling
back in time, to avoid data corruption. A node could get corrupted for other
reasons as well. Perhaps a disk is faulty and is serving wrong data. Either
way, if you end up in a scenario where a node is unhealthy and will refuse to
rejoin the cluster, it is time to do some operations to get the cluster back in
a healthy state.
It is not recommended to restore an etcd node from a vm snapshot, as that will
cause these kind of time-travelling behaviours which will make the node
unhealthy. To recover from this situation anyway,
I quote from the etcdv2 admin guide https://github.com/etcd-io/etcd/blob/master/Documentation/v2/admin_guide.md
If a member’s data directory is ever lost or corrupted then the user should
remove the etcd member from the cluster using etcdctl tool. A user should
avoid restarting an etcd member with a data directory from an out-of-date
backup. Using an out-of-date data directory can lead to inconsistency as the
member had agreed to store information via raft then re-joins saying it
needs that information again. For maximum safety, if an etcd member suffers
any sort of data corruption or loss, it must be removed from the cluster.
Once removed the member can be re-added with an empty data directory.
Note that this piece of documentation is from etcdv2 and not etcdv3. However
the etcdv3 docs describe a similar procedure here
https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/runtime-configuration.md#replace-a-failed-machine
The procedure to remove and add a member is documented here:
https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/runtime-configuration.md#remove-a-member
It is also documented in the kubernetes documentation:
https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#replacing-a-failed-etcd-member
So following the above guides step by step, we can recover our cluster to be
healthy again.
First let us make sure our broken member is stopped by runnning this on ``node``:
.. code:: sh
systemctl stop etcd
Now from a healthy node, e.g. ``node0`` remove the broken node
.. code:: sh
etcdctl3.sh member remove e767162297c84b1e
And we expect the output to be something like
.. code:: sh
Member e767162297c84b1e removed from cluster 432c10551aa096af
By removing the member from the cluster, you signal the other nodes to not
expect it to come back with the right state. It will be considered dead and
removed from the peers. This will allow the node to come up with an empty data
directory and it not getting kicked out of the cluster. The cluster should now
be healthy, but only have 2 members, and so it is not to resistent to crashes
at the moment! As we can see if we run the health check from a healthy node.
.. code:: sh
etcd-health.sh
And we expect only two nodes to be in the cluster::
Cluster-Endpoints: https://127.0.0.1:2379
cURL Command: curl -X GET https://127.0.0.1:2379/v2/members
member 7c37f7dc10558fae is healthy: got healthy result from https://10.10.1.11:2379
member cca4e6f315097b3b is healthy: got healthy result from https://10.10.1.10:2379
cluster is healthy
Now from a healthy node, re-add the node you just removed. Make sure
to replace the IP in the snippet below with the IP of the node you just removed.
.. code:: sh
etcdctl3.sh member add etcd_2 --peer-urls https://10.10.1.12:2380
And it should report that it has been added::
Member e13b1d076b2f9344 added to cluster 432c10551aa096af
ETCD_NAME="etcd_2"
ETCD_INITIAL_CLUSTER="etcd_1=https://10.10.1.11:2380,etcd_0=https://10.10.1.10:2380,etcd_2=https://10.10.1.12:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
it should now be in the list as "unstarted" instead of it not being in the list at all.
.. code:: sh
etcdctl3.sh member list
7c37f7dc10558fae, started, etcd_1, https://10.10.1.11:2380, https://10.10.1.11:2379
cca4e6f315097b3b, started, etcd_0, https://10.10.1.10:2380, https://10.10.1.10:2379
e13b1d076b2f9344, unstarted, , https://10.10.1.12:2380,
Now on the broken node, remove the on-disk state, which was corrupted, and start etcd
.. code:: sh
mv /var/lib/etcd /var/lib/etcd.bak
sudo systemctl start etcd
If we run the health check now, the cluster should report its healthy now again.
.. code:: sh
etcd-health.sh
And indeed it outputs so::
Cluster-Endpoints: https://127.0.0.1:2379
cURL Command: curl -X GET https://127.0.0.1:2379/v2/members
member 7c37f7dc10558fae is healthy: got healthy result from https://10.10.1.11:2379
member cca4e6f315097b3b is healthy: got healthy result from https://10.10.1.10:2379
member e13b1d076b2f9344 is healthy: got healthy result from https://10.10.1.12:2379
cluster is healthy