Restoring the Raft cluster from the current state of a node

To restore the Raft cluster from the current state of a selected node:

Prepare the cluster for recovery:
1. Choose the node to become the initial node of the new cluster. On this node, you need to stop the KUMA Core service, but without deleting the working directory and the systemd file.
2. Stop the KUMA Core service on all nodes in the cluster:
  sudo systemctl stop kuma-core-<KUMA_Core_service_ID>.service
3. Delete the working directories on all nodes in the cluster except the chosen initial node:
  sudo rmdir <directory_name>
4. On all nodes in the cluster except the initial node, remove the systemd services.
At this point, all KUMA Core services are stopped and the servers are ready for cluster recovery.
Restore the Raft cluster with a single KUMA Core:
1. On the initial node of the new Raft cluster, create the following file:
  sudo touch /opt/kaspersky/kuma/core/<service_id>/raft/.reset
2. On the initial node of the Raft cluster, remove the --raft.join parameter in the systemd file and apply the changes by running the following command:
  sudo systemctl daemon-reload
3. On the initial node of the Raft cluster, start the KUMA Core:
  sudo systemctl start kuma-core-<KUMA Core service ID>.service
The Raft cluster is restored with a single KUMA Core. On the rest of the nodes, the KUMA core services are stopped and the nodes are removed from the cluster.
If you need a high-availability Raft cluster, perform the restoration procedure on the rest of the cluster nodes:
- If you want to use the new servers for KUMA Core services, use the expand.sh installer and the expand.inventory.yml inventory file to deploy the KUMA Core services and add the nodes to the Raft cluster.
- If you want to use the old nodes, reinstall the KUMA Core services with the --raft.join option to add the nodes to the Raft cluster using the following command:
  sudo /opt/kaspersky/kuma/kuma core --raft.join <FQDN of the initial cluster node>:7210 --id <ID of the KUMA Core service copied from the web interface> --install

The cluster is restored. Certificates of services, such as collectors, correlators, and storages, do not need to be reset.

Page top