Move etcd - GCP High Availability

Overview

This procedure is required for all clusters, including remote clusters, and is run on data nodes only.

The Kubernetes etcd database must be on a separate disk to give it access to all the resources required for optimal TOS performance, stability and minimal latency. We recommend performing this procedure after you install the operating system and before you install TOS.

This procedure must be performed by an experienced Linux administrator with knowledge of network and storage configuration.

Only perform this procedure if you are mounting the etcd database on an in-place high availability deployment. If you are setting up a new high availability deployment, first follow the instructions in High Availability, and then return to this procedure. With new high availability deployments, TOS Aurora should only be installed on one data node.

Preliminary Preparations

Run the following command:
```
lsblk | grep "/var/lib/rancher/k3s/server/db"
```
lsblk | grep "/var/lib/rancher/k3s/server/db"
If the output contains /var/lib/rancher/k3s/server/db, etcd is already on a separate disk, and you do not need to perform this procedure.
Backup your TOS data. This needs to be performed for all primary data nodes - the central cluster and remote collector clusters.
If you are going to perform this procedure over multiple maintenance periods, create a new backup each time.
1. Create the backup using tos backup create:
2. You can check the backup creation status using tos backup status, which shows the status of backups in progress. Wait until completion before continuing.
3. Run the following command to display the list of backups saved on the node:
  [<ADMIN> ~]$ sudo tos backup list
  
  sudo tos backup list
4. Check that your backup file appears in the list, and that the status is "Completed".
5. If your backup files are saved locally:
  1. Run sudo tos backup export to save your backup file from a TOS backup directory as a single .gzip file. If there are other backups present, they will be included as well.
  2. Transfer the exported .gzip file to a safe, remote location.
    
    Make sure you have the location of your backups safely documented and accessible, including credentials needed to access them, for recovery when needed.
  After the backup is exported, we recommend verifying that the file contents can be viewed by running the following command:
  [Target location]$ tar tzvf <filename>
  
  tar tzvf <file name>
Switch to the root user.
```
[<ADMIN> ~]$ sudo su -
```
sudo su -
Install the rsync RPM.
```
[<ADMIN> ~]$ dnf install rsync
```
dnf install rsync
Find the name of the last disk added to the VM instance.
```
[<ADMIN> ~]# lsblk -ndl -o NAME
```
lsblk -ndl -o NAME
The output returns the list of disks on the VM instance. The last letter of the disk name indicates in which it was added, for example: sda, sdb, sdc.
Save the name of the last disk in a separate location. You will need it later for verification purposes.

Mount The etcd Database to a Separate Disk

Shut down TOS

Run the tmux command.
```
[<ADMIN> ~]$ tmux new-session -s etcd
```
tmux new-session -s etcd
Shut down TOS.
```
[<ADMIN> ~]# tos stop
```
tos stop

Wait for the following message:

Deployment has been stopped successfully

Stop the k3s service.
```
[<ADMIN> ~]# systemctl stop k3s.service
```
systemctl stop k3s.service
Disable the k3s service.
```
[<ADMIN> ~]# systemctl disable k3s.service
```
systemctl disable k3s.service
Verify that the k3s service is stopped and disabled.
```
[<ADMIN> ~]# systemctl is-active k3s.service
```
systemctl is-active k3s.service
Output should return inactive.
```
[<ADMIN> ~]# systemctl is-enabled k3s.service
```
systemctl is-enabled k3s.service
Output should return disabled.

Mount The etcd Database

Repeat steps these steps for each data node.

Create a backup directory.
1. Create an etcd backup directory with the timestamp on the /opt partition.
  [<ADMIN> ~]# mkdir /opt/etcd_data_backup_$(date "+%Y%m%d-%H%M%S") || echo "Fail"
  
  mkdir /opt/etcd_data_backup_$(date "+%Y%m%d-%H%M%S") || echo "Fail"
2. Identify the path of the etcd backup directory.
  [<ADMIN> ~]# ETCD_BACKUP_DIR="$(ls -1dt /opt/etcd_data_backup_* | head -n1)"
  
  ETCD_BACKUP_DIR="$(ls -1dt /opt/etcd_data_backup_* | head -n1)"
3. Verify that the etcd backup directory is assigned to the variable ETCD_BACKUP_DIR.
  [<ADMIN> ~]# echo "$ETCD_BACKUP_DIR"
  
  echo "$ETCD_BACKUP_DIR"
Locate the etcd database
The purpose of this step is to identify whether the etcd database is located in the k3s directory, or whether due to older architecture the etcd database is located in the gravity directory.
1. Check if there is a link to the etcd database.
  [<ADMIN> ~]# test -L /var/lib/rancher/k3s/server/db/etcd && echo "Etcd link exists."
  
  test -L /var/lib/rancher/k3s/server/db/etcd && echo "Etcd link exists."
  If the output is empty, no link exists. Proceed to step 2.
  
  If the output returns Etcd link exists, this indicates that the etcd database is in the gravity directory. Proceed to step 3.
2. Check if the database is in the k3s directory.
  [<ADMIN> ~]# test -d /var/lib/rancher/k3s/server/db/etcd || echo "Etcd directory does not exist."
  
  test -d /var/lib/rancher/k3s/server/db/etcd || echo "Etcd directory does not exist."
  If the output is empty, the etcd database is in the k3s directory. Do the following:
  1. Assign the path of the k3s directory to the ETCD_ROOT_DIR variable.
    
    [<ADMIN> ~]# ETCD_ROOT_DIR="/var/lib/rancher/k3s/server/db"
    
    ETCD_ROOT_DIR="/var/lib/rancher/k3s/server/db"
  2. Proceed to back up the etcd database.
  If the output returns Etcd directory does not exist, this indicates that the etcd database could not be found. Stop the procedure and contact customer support.
3. Check if there is a link to the etcd database in the gravity directory.
  [<ADMIN> ~]# test -d /var/lib/gravity/planet/etcd || echo "Etcd directory does not exist."
  
  test -d /var/lib/gravity/planet/etcd || echo "Etcd directory does not exist."
Back up the etcd database
1. Back up the etcd database to the backup directory.
  [<ADMIN> ~]# rsync -avP ${ETCD_ROOT_DIR}/ ${ETCD_BACKUP_DIR}/ && echo -e "\nOK\n" || echo -e "\nFail\n"
  
  rsync -avP ${ETCD_ROOT_DIR}/ ${ETCD_BACKUP_DIR}/ && echo -e "\nOK\n" || echo -e "\nFail\n" echo "Fail"
  Output should return ok.
2. If it exists, remove the etcd symbolic link.
  [<ADMIN> ~]# ETCD_LINK_PATH="/var/lib/rancher/k3s/server/db/etcd"
  
  [root@node]# ETCD_LINK_PATH="/var/lib/rancher/k3s/server/db/etcd"
  [<ADMIN> ~]# test -L ${ETCD_LINK_PATH} && rm -f ${ETCD_LINK_PATH}
  
  [root@node]# test -L ${ETCD_LINK_PATH} && rm -f ${ETCD_LINK_PATH}
Add a disk to the VM.
1. In GCP, go to the VM instance, and on the Details page, click Edit.
2. Under Additional Disks, click Add new disk.
3. Configure the following settings:
  - Name:
  - Source Type: Blank
  - Region: Same as VM instance
  - Location: Same as VM instance
  - Disk Type: ssd persistent disk
  - Storage: At least 50 GB
4. Click Done and then Save.
Mount the new disk.
Restore the etcd database.
1. After the system reboot, reinitialize the ETCD_BACKUP_DIR variable:
  [<ADMIN> ~]# ETCD_BACKUP_DIR="$(ls -1dt /opt/etcd_data_backup_* | head -n1)"
  
  ETCD_BACKUP_DIR="$(ls -1dt /opt/etcd_data_backup_* | head -n1)"
2. Set the etcd root directory variable:
  [<ADMIN> ~]# ETCD_ROOT_DIR="/var/lib/rancher/k3s/server/db"
  
  ETCD_ROOT_DIR="/var/lib/rancher/k3s/server/db"
3. Restore the etcd data using rsync:
  [<ADMIN> ~]# rsync -avP ${ETCD_BACKUP_DIR}/ ${ETCD_ROOT_DIR}/ && echo -e "\nOK\n" || echo -e "\nFail\n"
  
  rsync -avP ${ETCD_BACKUP_DIR}/ ${ETCD_ROOT_DIR}/ && echo -e "\nOK\n" || echo -e "\nFail\n"
Ensure that the output confirms a successful operation with OK. If it returns Fail, verify that both ETCD_BACKUP_DIR and ETCD_ROOT_DIR are correctly set and that the backup directory exists.

Start TOS

Start the k3s service.
```
[<ADMIN> ~]# systemctl start k3s.service
```
systemctl start k3s.service
Verify that there are no errors in the command output and that the service is active (running).
Enable the k3s service.
```
[<ADMIN> ~]# systemctl enable k3s.service
```
systemctl enable k3s.service
Verify that the k3s service is enabled.
```
[<ADMIN> ~]# systemctl is-enabled k3s.service
```
systemctl is-enabled k3s.service

The output should return enabled.

Primary data node only. Start TOS.
```
[<ADMIN> ~]# tos start
```
tos start

Check the Cluster Status

On the primary data nodes, check the TOS status.
```
[<ADMIN> ~]$ sudo tos status
```
sudo tos status
In the output, check if the System Status is Ok and all the items listed under Components appear as ok. If this is not the case, contact Tufin Support.

Example output:

[<ADMIN> ~]$ sudo tos status
 Tufin Orchestration Suite 2.0

 System Status: Ok
 System Mode:   Multi Node

 Nodes:
   1 Master, 1 Worker. Total 2 nodes. Nodes are healthy.

 Components:
   Node:            Ok
   Cassandra:       Ok
   Mongodb:         Ok
   Mongodb_sc:      Ok
   Nats:            Ok
   Neo4j:           Ok
   Postgres:        Ok
   Postgres_sc:     Ok