Sometimes you need to restart, install updates, or update the operating system on worker nodes and controllers of the Kubernetes cluster. This article describes a host maintenance procedure that minimizes the downtime of the KUMA Core in a high-availability configuration.
Before taking hosts out for maintenance, you need to create a backup copy of the KUMA core.
Controller maintenance
Cluster controllers must undergo maintenance one at a time. No additional steps are required before performing maintenance of a controller. After maintenance or an upgrade, you must make sure that the controller service has started successfully by checking the status of the service using the following commands:
sudo systemctl status k0scontroller
sudo k0s status
After the controller comes back up after maintenance, you can perform maintenance of the next controller. As long as you perform maintenance of controllers one by one, KUMA Core remains available.
Maintenance of working nodes
Worker nodes of the cluster must be taken out for maintenance strictly one at a time.
To perform maintenance on worker nodes:
sudo k0s kubectl get nodes
All worker nodes must have the ready status, otherwise the decommissioning of more nodes may lead to unavailability of the KUMA Core.
sudo k0s kubectl cordon <worker_node_name>
After that, the output of the sudo k0s kubectl get nodes command shows the node with the Ready,SchedulingDisabled status.
sudo k0s kubectl get pods -n kuma -o wide
sudo k0s kubectl rollout restart deployment core-deployment -n kuma
While the KUMA Core is being moved to another worker node, the KUMA Core is unavailable for approximately 10 minutes.
sudo k0s kubectl get pods -n kuma -o wide
The pod must have the Running status, and the name of the worker node in the NODE column must change.
sudo k0s stop
sudo k0s start
sudo systemctl status k0sworker
sudo k0s status
The k0sworker.service must be in the active (running) state.
The 'k0s status' command should return Status: Running.
sudo k0s kubectl uncordon <worker_node_name>
sudo k0s kubectl get nodes
The updated worker node must have the ready status.
sudo k0s kubectl get volume -n longhorn-system -o json | jq '.items[0].status.robustness'
The status must be healthy. If the status is degraded, then one of the replicas is unavailable or is being rebuilt.
sudo k0s kubectl get engine -n longhorn-system -o json | jq '.items[0].status.rebuildStatus'
sudo k0s kubectl get replicas -n longhorn-system
All replicas must have the running status.
Worker node maintenance is complete. If the serviced worker node is ready and the KUMA Core volume has the healthy status, you can proceed to perform maintenance on the next worker node.
Maintenance of the traffic balancer
Maintenance of the traffic balancer always results in the KUMA Core becoming temporarily unavailable, both for users and for KUMA services. While the balancer is unavailable, a KUMA Core pot cannot be moved from one worker node to another.
If you are planning a long downtime of the main balancer or substantial upgrades, we recommend the following:
When you switch over the traffic, current sessions may be terminated. In case of any problems, you can specify the old IP addresses and continue using the main balancer in the previous configuration.
The last step may not be necessary if you want to discard the old main balancer, that is, if you are permanently replacing the balancer with an updated host.
The resource requirements of the traffic balancer are minimal, therefore we recommend having a backup clone of the balancer virtual machine on hand to quickly restore the availability of the KUMA Core in case of any problems with the main virtual machine.
Page top