Move etcd - HA Non-Cloud VM

Overview

This procedure is required for all clusters, including remote clusters, and is run on data nodes only.

The Kubernetes etcd database must be on a separate disk to give it access to all the resources required for optimal TOS performance, stability and minimal latency.

This procedure must be performed by an experienced Linux administrator with knowledge of network configuration.

Only perform this procedure if you are mounting the etcd database on an in-place high availability deployment. If you are setting up a new high availability deployment, first follow the instructions in High Availability, and then return to this procedure. With new high availability deployments, TOS Aurora should only be installed on one data node.

Preliminary Preparations

Run the following command:
```
lsblk | grep "/var/lib/rancher/k3s/server/db"
```
lsblk | grep "/var/lib/rancher/k3s/server/db"
If the output contains /var/lib/rancher/k3s/server/db, etcd is already on a separate disk, and you do not need to perform this procedure.
If you are going to perform this procedure over multiple maintenance periods, create a new backup each time.
1. Create the backup using tos backup create:
2. You can check the backup creation status using tos backup status, which shows the status of backups in progress. Wait until completion before continuing.
3. Run the following command to display the list of backups saved on the node:
```
[<ADMIN> ~]$ sudo tos backup list
```
  sudo tos backup list
4. Check that your backup file appears in the list, and that the status is "Completed".
5. Run the following command to export the backup to a file:
```
[<ADMIN> ~]$ sudo tos backup export
```
  sudo tos backup export
6. If your backup files are saved locally:
  1. Run sudo tos backup export to save your backup file from a TOS backup directory as a single .gzip file. If there are other backups present, they will be included as well.
  2. Transfer the exported .gzip file to a safe, remote location.
    
    Make sure you have the location of your backups safely documented and accessible, including credentials needed to access them, for recovery when needed.
  After the backup is exported, we recommend verifying that the file contents can be viewed by running the following command:
```
[Target location]$ tar tzvf <filename>
```
  tar tzvf <file name>
Switch to the root user.
```
[<ADMIN> ~]$ sudo su -
```
sudo su -
Non-TufinOS VMs only. Install the rsync RPM.
```
[<ADMIN> ~]$ dnf install rsync
```
dnf install rsync
Find the name of the last disk added to the VM.
```
[<ADMIN> ~]# lsblk -ndl -o NAME
```
lsblk -ndl -o NAME
The output returns the list of disks on the VM. The last letter of the disk name indicates in which it was added, for example: sda, sdb, sdc.
Save the name of the last disk in a separate location. You will need it later for verification purposes.

Mount The etcd Database to a Separate Disk

Shut Down TOS

Shut down TOS.
```
[<ADMIN> ~]# tos stop
```
tos stop

Wait for the following message:

Deployment has been stopped successfully

Stop the k3s service.
```
[<ADMIN> ~]# systemctl stop k3s.service
```
systemctl stop k3s.service
Disable the k3s service.
```
[<ADMIN> ~]# systemctl disable k3s.service
```
systemctl disable k3s.service
Verify that the k3s service is stopped and disabled.
```
[<ADMIN> ~]# systemctl is-active k3s.service
```
systemctl is-active k3s.service
Output should return inactive.
```
[<ADMIN> ~]# systemctl is-enabled k3s.service
```
systemctl is-enabled k3s.service
Output should return disabled.

Mount The etcd Database

Repeat these steps for each data node.

Create a backup directory.
1. Create an etcd backup directory with the timestamp on the /opt partition.
  [<ADMIN> ~]# mkdir /opt/etcd_data_backup_$(date "+%Y%m%d-%H%M%S") || echo "Fail"
  
  mkdir /opt/etcd_data_backup_$(date "+%Y%m%d-%H%M%S") || echo "Fail"
2. Identify the path of the etcd backup directory.
  [<ADMIN> ~]# ETCD_BACKUP_DIR="$(ls -1dt /opt/etcd_data_backup_* | head -n1)"
  
  ETCD_BACKUP_DIR="$(ls -1dt /opt/etcd_data_backup_* | head -n1)"
3. Verify that the etcd backup directory is assigned to the variable ETCD_BACKUP_DIR.
  [<ADMIN> ~]# echo "$ETCD_BACKUP_DIR"
  
  echo "$ETCD_BACKUP_DIR"
Locate the etcd database
The purpose of this step is to identify whether the etcd database is located in the k3s directory, or whether due to older architecture the etcd database is located in the gravity directory.
1. Check if there is a link to the etcd database.
  [<ADMIN> ~]# test -L /var/lib/rancher/k3s/server/db/etcd && echo "Etcd link exists."
  
  test -L /var/lib/rancher/k3s/server/db/etcd && echo "Etcd link exists."
  If the output is empty, no link exists. Proceed to step 2.
  
  If the output returns Etcd link exists, this indicates that the etcd database is in the gravity directory. Proceed to step 3.
2. Check if the database is in the k3s directory.
  [<ADMIN> ~]# test -d /var/lib/rancher/k3s/server/db/etcd || echo "Etcd directory does not exist."
  
  test -d /var/lib/rancher/k3s/server/db/etcd || echo "Etcd directory does not exist."
  If the output is empty, the etcd database is in the k3s directory. Do the following:
  1. Assign the path of the k3s directory to the ETCD_ROOT_DIR variable.
    
    [<ADMIN> ~]# ETCD_ROOT_DIR="/var/lib/rancher/k3s/server/db"
    
    ETCD_ROOT_DIR="/var/lib/rancher/k3s/server/db"
  2. Proceed to back up the etcd database.
  If the output returns Etcd directory does not exist, this indicates that the etcd database could not be found. Stop the procedure and contact customer support.
3. Check if there is a link to the etcd database in the gravity directory.
  [<ADMIN> ~]# test -d /var/lib/gravity/planet/etcd || echo "Etcd directory does not exist."
  
  test -d /var/lib/gravity/planet/etcd || echo "Etcd directory does not exist."
Back up the etcd database
1. Back up the etcd database to the backup directory.
  [<ADMIN> ~]# rsync -avP ${ETCD_ROOT_DIR}/ ${ETCD_BACKUP_DIR}/ && echo -e "\nOK\n" || echo -e "\nFail\n"
  
  rsync -avP ${ETCD_ROOT_DIR}/ ${ETCD_BACKUP_DIR}/ && echo -e "\nOK\n" || echo -e "\nFail\n" echo "Fail"
  Output should return ok.
2. If it exists, remove the etcd symbolic link.
  [<ADMIN> ~]# ETCD_LINK_PATH="/var/lib/rancher/k3s/server/db/etcd"
  
  [root@node]# ETCD_LINK_PATH="/var/lib/rancher/k3s/server/db/etcd"
  [<ADMIN> ~]# test -L ${ETCD_LINK_PATH} && rm -f ${ETCD_LINK_PATH}
  
  [root@node]# test -L ${ETCD_LINK_PATH} && rm -f ${ETCD_LINK_PATH}
Add a disk to the VMWare ESXi machine.
1. In VMWare Vsphere, Go to Edit Settings > Add new device > Hard Disk, and configure the following:
  - New Hard Disk: Allocate a disk size of at least 128 GB.
  - Location: Datastore with
    
    High-performance storage
    
    An SSD with at least 7,500 IOPS and 250 MB/s throughput or higher
  - Disk Provisioning: Thin Provisioning
  - Sharing: No Sharing
  - Limit - IOPs: Unlimited
  - Disk Mode: Independent - Persistent
2. Click OK.
Mount the new disk.
1. Log into the data node as the root user.
2. Verify that the new disk is recognized by TufinOS.
  [<ADMIN> ~]# lsblk
  
  lsblk
  [<ADMIN> ~]# ls -l /dev/sd*
  
  ls -l /dev/sd*
  Compare the output with the name of the disk you saved in the preliminary preparations, and verify that the disk name it returned ends with the next letter in the alphabet. For example, if the name you saved was sdb the output should return sdc. This indicates that TufinOS recognizes the new disk.
3. Create a variable with the block device path of the new disk.
  [<ADMIN> ~]# BLOCK_DEV="/dev/sd<>"
  
  BLOCK_DEV="/dev/sd<>"
  where <> represents the letter of the new disk.
4. Generate a UUID for the block device of the new disk.
  [<ADMIN> ~]# BLOCK_UUID="$(uuidgen)"
  
  BLOCK_UUID="$(uuidgen)"
5. Create a primary partition on the new disk.
  [<ADMIN> ~]# parted -s -a optimal ${BLOCK_DEV} mklabel msdos -- mkpart primary ext4 1MiB 100%
  
  parted -s -a optimal ${BLOCK_DEV} mklabel msdos -- mkpart primary ext4 1MiB 100%
6. Verify that the partition was created.
  [<ADMIN> ~]# parted -s ${BLOCK_DEV} print
  
  parted -s ${BLOCK_DEV} print
7. Format the partition as ext4.
  [<ADMIN> ~]# mkfs.ext4 -L ETCD -U ${BLOCK_UUID} ${BLOCK_DEV}1
  
  mkfs.ext4 -L ETCD -U ${BLOCK_UUID} ${BLOCK_DEV}1
8. Verify that the partition has been formatted with the UUID and the etcd label (output should return the partition with UUID and an ETCD label).
  [<ADMIN> ~]# blkid | grep "$BLOCK_UUID"
  
  blkid | grep "$BLOCK_UUID"
9. Create the mount point of the etcd database.
  [<ADMIN> ~]# mkdir -p /var/lib/rancher/k3s/server/db
  
  mkdir -p /var/lib/rancher/k3s/server/db
10. Set the partition to mount upon operating system startup.
  [<ADMIN> ~]# echo "UUID=${BLOCK_UUID} /var/lib/rancher/k3s/server/db ext4 defaults 0 0" >> /etc/fstab
  
  echo "UUID=${BLOCK_UUID} /var/lib/rancher/k3s/server/db ext4 defaults 0 0" >> /etc/fstab
11. Load the changes to the filesystem.
  [<ADMIN> ~]# systemctl daemon-reload
  
  systemctl daemon-reload
12. Mount the partition that was added to /etc/fstab.
  [<ADMIN> ~]# mount /var/lib/rancher/k3s/server/db
  
  mount /var/lib/rancher/k3s/server/db
  If the output is not empty, stop the procedure. The etcd disk cannot be mounted. Review what was missed in the previous steps.
13. Verify the partition has been mounted (the output should return the block device and mount point).
  [<ADMIN> ~]# mount | grep "/var/lib/rancher/k3s/server/db"
  
  mount | grep "/var/lib/rancher/k3s/server/db"
  If the output is empty, stop the procedure. The etcd disk is not mounted. Review what was missed in the previous steps.
Restore the etcd database.
[<ADMIN> ~]# ETCD_ROOT_DIR="/var/lib/rancher/k3s/server/db"
ETCD_ROOT_DIR="/var/lib/rancher/k3s/server/db"
[<ADMIN> ~]# rsync -avP ${ETCD_BACKUP_DIR}/ ${ETCD_ROOT_DIR}/ && echo -e "\nOK\n" || echo -e "\nFail\n"
rsync -avP ${ETCD_BACKUP_DIR}/ ${ETCD_ROOT_DIR}/ && echo -e "\nOK\n" || echo -e "\nFail\n"
Output should return ok.

Start TOS

Start the k3s service.
```
[<ADMIN> ~]# systemctl start k3s.service
```
systemctl start k3s.service
Verify that there are no errors in the command output and that the service is active (running).
Enable the k3s service.
```
[<ADMIN> ~]# systemctl enable k3s.service
```
systemctl enable k3s.service
Verify that the k3s service is enabled.
```
[<ADMIN> ~]# systemctl is-enabled k3s.service
```
systemctl is-enabled k3s.service

The output should return enabled.

Primary data node only. Start TOS.
```
[<ADMIN> ~]# tos start
```
tos start

Check the Cluster Status

On the primary data nodes, check the TOS status.
```
[<ADMIN> ~]$ sudo tos status
```
sudo tos status
In the output, check if the System Status is Ok and all the items listed under Components appear as ok. If this is not the case, contact Tufin Support.

Example output:

[<ADMIN> ~]$ sudo tos status
 Tufin Orchestration Suite 2.0

 System Status: Ok
 System Mode:   Multi Node

 Nodes:
   1 Master, 1 Worker. Total 2 nodes. Nodes are healthy.

 Components:
   Node:            Ok
   Cassandra:       Ok
   Mongodb:         Ok
   Mongodb_sc:      Ok
   Nats:            Ok
   Neo4j:           Ok
   Postgres:        Ok
   Postgres_sc:     Ok