Move etcd - In-place GCP VM Instance

Overview

This procedure is required for all clusters, including remote clusters, and is run on data nodes only.

The Kubernetes etcd database must be on a separate disk to give it access to all the resources required for optimal TOS performance, stability and minimal latency.

This procedure must be performed by an experienced Linux administrator with knowledge of network and storage configuration.

Preliminary Preparations

  1. Run the following command:

    lsblk | grep "/var/lib/rancher/k3s/server/db"
    lsblk | grep "/var/lib/rancher/k3s/server/db"

    If the output contains /var/lib/rancher/k3s/server/db, etcd is already on a separate disk, and you do not need to perform this procedure.

  2. Switch to the root user.

    [<ADMIN> ~]$ sudo su -
    sudo su -
  3. Install the rsync RPM.

    [<ADMIN> ~]$ dnf install rsync
    dnf install rsync
  4. Find the name of the last disk added to the VM instance.

    [<ADMIN> ~]# lsblk -ndl -o NAME
    lsblk -ndl -o NAME

    The output returns the list of disks on the VM instance. The last letter of the disk name indicates in which it was added, for example: sda, sdb, sdc.

  5. Save the name of the last disk in a separate location. You will need it later for verification purposes.

Mount The etcd Database to a Separate Disk

  1. Run the tmux command.

    [<ADMIN> ~]$ tmux new-session -s etcd
    tmux new-session -s etcd
  2. On the primary data node, check the TOS status.

    [<ADMIN> ~]$ sudo tos status
    sudo tos status
  3. In the output, check if the System Status is Ok and all the items listed under Components appear as Ok. If this is not the case, contact Tufin Support.

  4. Example output for a central cluster data node:

    [<ADMIN> ~]$ tos status         
    [Mar 28 13:42:09]  INFO Checking cluster health status           
    TOS Aurora
    Tos Version: 24.2 (PRC1.1.0)
    
    System Status: "Ok"
                
    Cluster Status:
       Status: "Ok"
       Mode: "Multi Node"
    
    Nodes
      Nodes:
      - ["node1"]
        Type: "Primary"
        Status: "Ok"
        Disk usage:
        - ["/opt"]
          Status: "Ok"
          Usage: 19%
      - ["node3"]
        Type: "Worker Node"
        Status: "Ok"
        Disk usage:
        - ["/opt"]
          Status: "Ok"
          Usage: 4%
    
    registry
      Expiration ETA: 819 days
      Status: "Ok"
    
    Infra
    Databases:
    - ["cassandra"]
      Status: "Ok"
    - ["kafka"]
      Status: "Ok"
    - ["mongodb"]
      Status: "Ok"
    - ["mongodb_sc"]
      Status: "Ok"
    - ["ongDb"]
      Status: "Ok"
    - ["postgres"]
      Status: "Ok"
    - ["postgres_sc"]
      Status: "Ok"
    
    Application
    Application Services Status OK
    Running services 50/50
    
    Remote Clusters
    Number Of Remote Clusters: 2
      - ["RC"]
         Connectivity Status:: "OK:"
      - ["RC2"]
         Connectivity Status:: "OK"
    
      Backup Storage:
      Location: "Local
    s3:http://minio.default.svc:9000/velerok8s/restic/default "
      Status: "Ok"
      Latest Backup: 2024-03-23 05:00:34 +0000 UTC			

    Example output for a remote cluster data node:

    [<ADMIN> ~]$ tos status         
    [Mar 28 13:42:09]  INFO Checking cluster health status           
    TOS Aurora
    Tos Version: 24.2 (PRC1.0.0)
    
    System Status: "Ok"
                
    Cluster Status:
       Status: "Ok"
       Mode: "Single Node"
    
    Nodes
      Nodes:
      - ["node2"]
        Type: "Primary"
        Status: "Ok"
        Disk usage:
        - ["/opt"]
          Status: "Ok"
          Usage: 19%
      
    registry
      Expiration ETA: 819 days
      Status: "Ok"
    
    Infra
    Databases:
    - ["mongodb"]
      Status: "Ok"
    - ["postgres"]
      Status: "Ok"
    
    Application
    Application Services Status OK
    Running services 16/16
    
      Backup Storage:
      Location: "Local
    s3:http://minio.default.svc:9000/velerok8s/restic/default "
      Status: "Ok"
      Latest Backup: 2024-03-23 05:00:34 +0000 UTC