Adding a Worker Node - Open Server

Overview

This procedure is for adding a worker node to an existing TOS cluster deployed on a bare-metal server or hypervisor running RHEL or Rocky Linux. If you have not yet installed TOS, on the primary data node, start with Prepare an Open Server .

For all other installation and upgrade options, see Installing and Upgrading.

You do not need to install TOS on the worker nodes.

Prerequisites

General Requirements

This procedure must be performed by an experienced Linux administrator with knowledge of network and storage configuration.

To ensure optimal performance and reliability, the required resources need to be allocated exclusively to TOS. If resources become unavailable, this will affect TOS performance. Do not oversubscribe resources.
IP tables version 1.8.5 and above. IP tables must be reserved exclusively for TOS Aurora and cannot be used for any other purpose. During installation, any existing IP tables configurations will be flushed and replaced.
Your primary data node must also be deployed on the same operating system as the worker node.
You must know the resources you will need - CPU cores, RAM, disk space and the load-model parameter, provided by your account team based on the procedure Calculate resources - clean install.
We do not recommend installing on your server 3rd party software not specified in the current procedure. It may impact TOS functionality and features, and it is your responsibility to verify that it is safe to use.
(On-premises deployments only) The node's network IP must be on the same subnet as the cluster primary VIP.
Give the node a unique hostname in the cluster - use the command below, replacing <mynode> with your preferred name:

[<ADMIN> ~]$ sudo hostnamectl set-hostname <mynode>

sudo hostnamectl set-hostname <mynode>

If you intend to use syslog, allocate a syslog VIP on the same subnet as your primary VIP.

Operating System Requirements

OS distribution:
- Red Hat Enterprise Linux 8.10
- Rocky Linux 8.10
Disks:
- Select a storage type of SSD. Take into consideration that TOS requires 7,500 IOPS and the throughput expected will average 250MB/s with bursts of up to 700MB/s.
- The disk for the operating system and TOS data requires three partitions: /opt, /var and /tmp. Minimum disk size: 400 GB.
- Partition sizes:
  - /opt: Use the Sizing Calculator to determine the partition size
  - /var: 200 GB
  - /tmp: 25 GB
- Data nodes require an additional disk for etcd. Size: 50 GB
- We recommend allocating the /opt partition all remaining disk space after you have partitioned the OS disk and moved etcd to a separate disk.
Secure boot must be disabled.

The kernel must be up-to-date
SELinux must be disabled (recommended), or configured to run in permissive mode, as described in Enabling SELinux in Permissive Mode.
Language: en-US
You must have permissions to execute TOS CLI commands located in directory /usr/local/bin/tos and to use sudo if necessary.
To run TOS CLI commands without specifying the full path (/usr/local/bin/tos), your environment path must be modified accordingly.
The server timezone must be set.

Network Requirements

Tufin Orchestration Suite must only be installed in an appropriately secured network and physical location. Only authorized users should be granted access to TOS products and the operating system on the server.
You must allow access to required Ports and Services.
All TOS nodes need to be on the same subnet and layer 2 network that supports ARP (address resolution protocol).
All TOS nodes should have network latency of under 1ms.
Network configurations for your interface must be set to manual IPv4 with gateway and DNS Servers set to the IPs used by your organization.
Allocate a 24-bit CIDR subnet for the Kubernetes service network and a 16-bit CIDR subnet for the Kubernetes pods network (10.244.0.0/16 is used by default).

The pods and services networks must be inside the following private networks: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. In addition, ensure that the dedicated CIDR for the service network and pods network don't overlap with each other, and:
- The physical addresses of your TOS servers (see below)
- Your primary VIP, Syslog VIP or external load balancer IP (see below)
- Any other subnets communicating with TOS or with TOS nodes
If a proxy is configured on your system, make sure this network is excluded.
You must have available the following dedicated IP addresses:
- For on-premises deployments, a primary VIP that will serve as the external IP address used to access TOS from your browser. The primary VIP will not be needed in the installation of the operating system, except in the final step - the installation command.
- The physical network IP address of the first network interface used by the administrator for CLI commands. This is the IP address you will use in most steps of the procedure.
- If additional nodes are subsequently added to the cluster, each node will require an additional dedicated physical network IP address.
- Additional syslog VIPs can be allocated as needed.
- The VIP, all node physical network IP addresses and all syslog VIPs must be on the first network interface.
- Make sure your first physical interface is correctly configured and all other interfaces are not on the same network.
  
  To find the first network interface, run the following command:
```
[<ADMIN> ~]$ sudo /opt/tufinos/scripts/network_interface_by_pci_order.sh | awk -F'=' '/NET_IFS\[0\]/ { print $NF }'
```
  sudo /opt/tufinos/scripts/network_interface_by_pci_order.sh | awk -F'=' '/NET_IFS\[0\]/ { print $NF }'
  Otherwise network errors such as connectivity failures and incorrect traffic routing might occur.

Procedure

Before you proceed, read and understand Prerequisites - this may prevent unexpected failures.

Configure the operating system.
Add the node to the cluster.

Check the TOS status..

On the primary data node, check the TOS status.
```
[<ADMIN> ~]$ sudo tos status
```
sudo tos status
In the output, check if the System Status is Ok and all the items listed under Components appear as Ok. If this is not the case, contact Tufin Support.

Example output for a central cluster data node:

[<ADMIN> ~]$ tos status         
[Mar 28 13:42:09]  INFO Checking cluster health status           
TOS Aurora
Tos Version: 24.2 (PRC1.1.0)

System Status: "Ok"
            
Cluster Status:
   Status: "Ok"
   Mode: "Multi Node"

Nodes
  Nodes:
  - ["node1"]
    Type: "Primary"
    Status: "Ok"
    Disk usage:
    - ["/opt"]
      Status: "Ok"
      Usage: 19%
  - ["node3"]
    Type: "Worker Node"
    Status: "Ok"
    Disk usage:
    - ["/opt"]
      Status: "Ok"
      Usage: 4%

registry
  Expiration ETA: 819 days
  Status: "Ok"

Infra
Databases:
- ["cassandra"]
  Status: "Ok"
- ["kafka"]
  Status: "Ok"
- ["mongodb"]
  Status: "Ok"
- ["mongodb_sc"]
  Status: "Ok"
- ["ongDb"]
  Status: "Ok"
- ["postgres"]
  Status: "Ok"
- ["postgres_sc"]
  Status: "Ok"

Application
Application Services Status OK
Running services 50/50

Remote Clusters
Number Of Remote Clusters: 2
  - ["RC"]
     Connectivity Status:: "OK:"
  - ["RC2"]
     Connectivity Status:: "OK"

  Backup Storage:
  Location: "Local
s3:http://minio.default.svc:9000/velerok8s/restic/default "
  Status: "Ok"
  Latest Backup: 2024-03-23 05:00:34 +0000 UTC

Example output for a remote cluster data node:

[<ADMIN> ~]$ tos status         
[Mar 28 13:42:09]  INFO Checking cluster health status           
TOS Aurora
Tos Version: 24.2 (PRC1.0.0)

System Status: "Ok"
            
Cluster Status:
   Status: "Ok"
   Mode: "Single Node"

Nodes
  Nodes:
  - ["node2"]
    Type: "Primary"
    Status: "Ok"
    Disk usage:
    - ["/opt"]
      Status: "Ok"
      Usage: 19%
  
registry
  Expiration ETA: 819 days
  Status: "Ok"

Infra
Databases:
- ["mongodb"]
  Status: "Ok"
- ["postgres"]
  Status: "Ok"

Application
Application Services Status OK
Running services 16/16

  Backup Storage:
  Location: "Local
s3:http://minio.default.svc:9000/velerok8s/restic/default "
  Status: "Ok"
  Latest Backup: 2024-03-23 05:00:34 +0000 UTC

After the node is added, we recommend stopping tos and then starting it to enhance the node's performance. This will require downtime.