Move etcd - GCP High Availability

Overview

The etcd database should be on a separate disk to improve the stability of TOS Aurora and reduce latency. Moving the etcd database to a separate disk ensures that the kubernetes database has access to all the resources required to ensure an optimal TOS performance. We recommend performing this procedure after you install the operating system and before you install TOS.

This procedure is only required for data nodes running: Rocky Linux 8 or RHEL 8.

This procedure must be performed by an experienced Linux administrator with knowledge of network configuration.

Only perform this procedure if you are mounting the etcd database on an in-place high availability deployment. If you are setting up a new high availability deployment, first follow the instructions in High Availability, and then return to this procedure. With new high availability deployments, TOS Aurora should only be installed on one data node.

Preliminary Preparations

  1. Run the following command:

    lsblk | grep "/var/lib/rancher/k3s/server/db"
    lsblk | grep "/var/lib/rancher/k3s/server/db"

    If the output contains /var/lib/rancher/k3s/server/db, etcd is already on a separate disk, and you do not need to perform this procedure.

    If no output is returned, continue the procedure.

  2. Switch to the root user.

    [<ADMIN> ~]$ sudo su -
    sudo su -
  3. Install the rsync RPM.

    [<ADMIN> ~]$ dnf install rsync
    dnf install rsync
  4. Find the name of the last disk added to the VM instance.

    [<ADMIN> ~]# lsblk -ndl -o NAME
    lsblk -ndl -o NAME

    The output returns the list of disks on the VM instance. The last letter of the disk name indicates in which it was added, for example: sda, sdb, sdc.

  5. Save the name of the last disk in a separate location. You will need it later for verification purposes.

Mount The etcd Database to a Separate Disk

Shut down TOS

  1. Shut down TOS.

    [<ADMIN> ~]# tos stop
    tos stop
  2. Wait for the following message:

    Deployment has been stopped successfully
  3. Stop the k3s service.

    [<ADMIN> ~]# systemctl stop k3s.service
    systemctl stop k3s.service
  4. Disable the k3s service.

    [<ADMIN> ~]# systemctl disable k3s.service
    systemctl disable k3s.service
  5. Verify that the k3s service is stopped and disabled.

    [<ADMIN> ~]# systemctl is-active k3s.service
    systemctl is-active k3s.service

    Output should return inactive.

    [<ADMIN> ~]# systemctl is-enabled k3s.service
    systemctl is-enabled k3s.service

    Output should return disabled.

Mount The etcd Database

Repeat steps these steps for each data node.

Start TOS

  1. Start the k3s service.

    [<ADMIN> ~]# systemctl start k3s.service
    systemctl start k3s.service

    Verify that there are no errors in the command output and that the service is active (running).

  2. Enable the k3s service.

    [<ADMIN> ~]# systemctl enable k3s.service
    systemctl enable k3s.service
  3. Verify that the k3s service is enabled.

    [<ADMIN> ~]# systemctl is-enabled k3s.service
    systemctl is-enabled k3s.service
  4. The output should return enabled.

  5. Primary data node only. Start TOS.

    [<ADMIN> ~]# tos start
    tos start

Check the Cluster Status

  1. On the primary data node, check the TOS status.

    [<ADMIN> ~]$ sudo tos status
    sudo tos status
  2. In the output, check if the System Status is Ok and all the items listed under Components appear as ok. If this is not the case, contact Tufin Support.

  3. Example output:

    [<ADMIN> ~]$ tos status         
    [Mar 28 13:42:09]  INFO Checking cluster health status           
    TOS Aurora
    Tos Version: 24.2 (PRC1.1.0)
    
    System Status: "Ok"
                
    Cluster Status:
       Status: "Ok"
       Mode: "High Availability"
    
    Nodes
      Nodes:
      - ["node1"]
        Type: "Primary"
        Status: "Ok"
        Disk usage:
        - ["/opt"]
          Status: "Ok"
          Usage: 32%
      - ["node3"]
        Type: "Ha Data Node"
        Status: "Ok"
        Disk usage:
        - ["/opt"]
          Status: "Ok"
          Usage: 11%
      - ["node2"]
        Type: "Ha Data Node"
        Status: "Ok"
        Disk usage:
        - ["/opt"]
          Status: "Ok"
          Usage: 11%
    
    registry
      Expiration ETA: 819 days
      Status: "Ok"
    
    Infra
    Databases:
    - ["cassandra"]
      Status: "Ok"
    - ["kafka"]
      Status: "Ok"
    - ["mongodb"]
      Status: "Ok"
    - ["mongodb_sc"]
      Status: "Ok"
    - ["ongDb"]
      Status: "Ok"
    - ["postgres"]
      Status: "Ok"
    - [postgres_sc"]
      Status: "Ok"
    
    Application
    Application Services Status OK
    Running services 54/54
    
      Backup Storage:
      Location: "Local
    s3:http://minio.default.svc:9000/velerok8s/restic/default "
      Status: "Ok"
      Latest Backup: 2024-03-23 05:00:34 +0000 UTC