Deploy High Availability

This procedure explains how to deploy high availability in a TOS environment. For more information on the high availability architecture, see High Availability Architecture

Prerequisites

Prepare two additional machines, one for each data node to be added to the cluster, making three in total, including the primary data node.

High availability is supported for Tufin Appliances, VMware VMs, GCP and Open Server deployments.
All the nodes in the cluster need to be connected to the same L2 network and share the same subnet.
Each machine requires:
- The same resources as allocated to your primary data node.
- Allocation of a dedicated static IP address.
- A supported operating system:
  - TufinOS 4.60
  - Red Hat Enterprise Linux 8.10
  - Rocky Linux 8.10

There is no need for a special interface to enable high availability in TOS. Communication between the nodes is done internally on the cluster.

Add Data Nodes

After adding the first new data node, proceed immediately to add the second; running on only two data nodes (i.e. primary data node and one additional data node only) will not allow high availability to be enabled and will make the cluster unstable.

Log in to the primary data node.
On the primary data node:
```
[<ADMIN> ~]$ sudo tos cluster node add --role=data
```
sudo tos cluster node add --role=data
On completion, a new command string is displayed, which you will need to run on the new node within 30 minutes. If the allocated time expires, you will need to repeat the current step.
Copy the command string to the clipboard.
Log in to the new node.
On the new node, paste the command string copied previously and run it. If the allocated time has expired, you will need to start from the beginning.
Verify that the node was added by running sudo tos cluster node list on the primary data node.
On the primary data node, check the TOS status.
```
[<ADMIN> ~]$ sudo tos status
```
sudo tos status
In the output, check if the System Status is Ok and all the items listed under Components appear as ok. If this is not the case, contact Tufin Support.

Example output:

[<ADMIN> ~]$ tos status         
[Mar 28 13:42:09]  INFO Checking cluster health status           
TOS Aurora
Tos Version: 25.2 (PRC1.0.0)

System Status: "Ok"
            
Cluster Status:
   Status: "Ok"
   Mode: "Preparing High Availability"

Nodes
  Nodes:
  - ["datanode1"]
    Type: "Primary"
    Status: "Ok"
    Disk usage:
    - ["/opt"]
      Status: "Ok"
      Usage: 32%
  - ["datanode2"]
    Type: "Data Node"
    Status: "Ok"
    Disk usage:
    - ["/opt"]
      Status: "Ok"
      Usage: 16%
 
registry
  Expiration ETA: 819 days
  Status: "Ok"

Infra
Databases:
- ["cassandra"]
  Status: "Ok"
- ["kafka"]
  Status: "Ok"
- ["mongodb"]
  Status: "Ok"
- ["mongodb_sc"]
  Status: "Ok"
- ["ongDb"]
  Status: "Ok"
- ["postgres"]
  Status: "Ok"
- [postgres_sc"]
  Status: "Ok"

Application
Application Services Status OK
Running services 54/54

  Backup Storage:
  Location: "Local
s3:http://minio.default.svc:9000/velerok8s/restic/default "
  Status: "Ok"
  Latest Backup: 2024-03-23 05:00:34 +0000 UTC

Enable High Availability

On the primary data node:
```
[<ADMIN> ~]$ sudo tos cluster ha enable
```
sudo tos cluster ha enable
Replication of data will commence. The time to completion will vary depending on the size of your database.

On completion, TOS will be in high availability mode.
Verify that HA is active by running sudo tos status.
We recommend defining a notification to inform you in the event of a change in the health of your cluster - see TOS Monitoring.

You are now in HA mode.