Restarting a machine in a Kubernetes cluster¶
NOTE¶
- Know which kind of machine is going to be restarted
- control plane (api-server, controllers, etc.)
- node (runs actual workload, e.g. Brig or Webapp)
- a and b combined
- The kind of machine in question must be deployed redundantly
- Take out machines in a rolling fashion (sequentially, one at a time)
Control plane¶
Depending on whether etcd is hosted on the same machine alongside the control plane (common practise), you need to take its implications into account (see How to rolling-restart an etcd cluster) when restarting a machine.
Regardless of where etcd is located, before turning off any machine that is part of the control plane, one should back up the cluster state.
If a part of the control plane does not run sufficiently redundant, it is advised to prevent any mutating interaction during the procedure, until the cluster is healthy again.
Node¶
High-level steps:¶
- Drain the node so that all workload is rescheduled on other nodes
- Restart / Update / Decommission
- Mark the node as being schedulable again (if not decommissioned)
For more details please refer to the official documentation: Safely Drain a Node