Sometimes you need to restart, install updates, or update the operating system on worker nodes and controllers of the Kubernetes cluster. The following sections of this article describe how to take hosts out for maintenance to minimize the downtime of the KUMA Core in a high-availability configuration.
Before performing any manipulations with hosts, you must back up the KUMA Core.
Controller maintenance
Cluster controllers must be taken out of service strictly one at a time. You do not need to perform any preliminary steps before taking a controller out for maintenance. After performing maintenance or an upgrade, you must make sure that the controller service has started successfully by checking the status of the service using the following commands:
sudo systemctl status k0scontroller
sudo k0s status
After the controller comes back up after maintenance, you can take out the next controller for maintenance. The availability of the KUMA Core is not interrupted when the controllers are taken out for maintenance one after another.
Maintenance of working nodes
Worker nodes of the cluster must be taken out of service strictly one at a time.
To perform maintenance on worker nodes:
sudo k0s kubectl get nodes
All worker nodes must have the ready
status, otherwise the decommissioning of more nodes may lead to the complete unavailability of the KUMA Core.
sudo k0s kubectl cordon <
worker_node_name
>
After that, the output of the sudo k0s kubectl get nodes
command shows the node with the Ready,SchedulingDisabled
status.
sudo k0s kubectl get pods -n kuma -o wide
sudo k0s kubectl rollout restart deployment core-deployment -n kuma
While the KUMA Core is being moved to another worker node, access to the KUMA Core is suspended for approximately 10 minutes.
sudo k0s kubectl get pods -n kuma -o wide
The pod must have the Running
status, and the name of the worker node in the NODE column must change.
sudo k0s stop
sudo k0s start
sudo systemctl status k0sworker
sudo k0s status
sudo k0s kubectl uncordon <
worker_node_name
>
sudo k0s kubectl get nodes
The updated worker node must have the ready
status.
sudo k0s kubectl get volume -n longhorn-system -o json | jq '.items[0].status.robustness'
The status should be healthy
; if the status is degraded
, one of the replicas is unavailable or is being rebuilt.
sudo k0s kubectl get engine -n longhorn-system -o json | jq '.items[0].status.rebuildStatus'
sudo k0s kubectl get replicas -n longhorn-system
All replicas must have the running
status.
Worker node maintenance is complete. If the serviced worker node is ready and the KUMA Core volume has the 'healthy' status, you can proceed to perform maintenance on the next worker node.
Maintenance of the traffic balancer
Taking out the traffic balancer for maintenance always results in the KUMA Core becoming temporarily unavailable, both for users and for KUMA services. While the balancer is unavailable, a KUMA Core pot cannot be moved from one worker node to another.
If you are planning a long downtime of the main balancer or substantial upgrades, we recommend the following:
When you switch over the traffic, current sessions may be terminated. In case of any problems, you can specify the old IP addresses and continue using the main balancer in the previous configuration.
The last step may not be necessary if you want to discard the old main balancer, that is, if you are permanently replacing the balancer with an updated host.
The resource requirements of the traffic balancer are minimal, therefore we recommend having a backup clone of the balancer virtual machine on hand to quickly restore the availability of the KUMA Core in case of any problems with the main virtual machine.
Page top