Adding a Worker Node - Azure

Overview

This procedure is for adding a worker node to an existing TOS cluster on the Azure platform. If you have not yet installed TOS on the primary data node, start with Clean Install. For all other installation paths such as upgrade or other platforms, see the menu for the appropriate procedure.

You do not need to install TOS on the worker nodes.

Prerequisites

General Requirements

Your primary data node must also be deployed on Azure.
You must know the resources you will need - CPU cores, RAM, disk space and the load-model parameter, provided by your account team based on the procedure Calculate resources - clean install.
You will need to allow access to required Ports and Services.

Operating System Requirements

OS distribution:
- Red Hat Enterprise Linux 8.10
- Rocky Linux 8.10
Disks:
- Select a storage type of SSD. Take into consideration that TOS requires 7,500 IOPS and the throughput expected will average 250MB/s with bursts of up to 700MB/s.
- The disk for the operating system and TOS data requires three partitions: /opt, /var and /tmp. Minimum disk size: 400 GB.
- Partition sizes:
  - /opt: Storage size is determined by the sizing information sent by Tufin. Minimum: 400 GB.
  - /var: 200 GB
  - /tmp: 25 GB
- Data nodes require an additional disk for etcd. Size: 50 GB
- Do not add additional disks before installing TOS. If additional storage is later required, you can extend the partition size by adding an additional disk after TOS has been installed.
- We recommend allocating the /opt partition all remaining disk space after you have partitioned the OS disk and moved etcd to a separate disk.
Secure boot must be disabled.

The kernel must be up-to-date
SELinux must be disabled (recommended), or configured to run in permissive mode, as described in Enabling SELinux in Permissive Mode.
Language: en-US
You must have permissions to execute TOS CLI commands located in directory /usr/local/bin/tos and to use sudo if necessary.
To run TOS CLI commands without specifying the full path (/usr/local/bin/tos), your environment path must be modified accordingly.
The server timezone must be set.

Network Requirements

Tufin Orchestration Suite must only be installed in an appropriately secured network and physical location. Only authorized users should be granted access to TOS products and the operating system on the server.
You must allow access to required Ports and Services.
All TOS nodes need to be on the same subnet and layer 2 network that supports ARP (address resolution protocol).
All TOS nodes should have network latency of under 1ms.
Network configurations for your interface must be set to manual IPv4 with gateway and DNS Servers set to the IPs used by your organization.
Allocate a 24-bit CIDR subnet for the Kubernetes service network and a 16-bit CIDR subnet for the Kubernetes pods network (10.244.0.0/16 is used by default).

The pods and services networks must be inside the following private networks: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. In addition, ensure that the dedicated CIDR for the service network and pods network don't overlap with each other, and:
- The physical addresses of your TOS servers (see below)
- Your primary VIP, Syslog VIP or external load balancer IP (see below)
- Any other subnets communicating with TOS or with TOS nodes
If a proxy is configured on your system, make sure this network is excluded.
You must have available the following dedicated IP addresses:
- For on-premises deployments, a primary VIP that will serve as the external IP address used to access TOS from your browser. The primary VIP will not be needed in the installation of the operating system, except in the final step - the installation command.
- The physical network IP address of the first network interface used by the administrator for CLI commands. This is the IP address you will use in most steps of the procedure.
- If additional nodes are subsequently added to the cluster, each node will require an additional dedicated physical network IP address.
- Additional syslog VIPs can be allocated as needed.
- The VIP, all node physical network IP addresses and all syslog VIPs must be on the first network interface.
- Make sure your first physical interface is correctly configured and all other interfaces are not on the same network.
  
  To find the first network interface, run the following command:
```
[<ADMIN> ~]$ sudo /opt/tufinos/scripts/network_interface_by_pci_order.sh | awk -F'=' '/NET_IFS\[0\]/ { print $NF }'
```
  sudo /opt/tufinos/scripts/network_interface_by_pci_order.sh | awk -F'=' '/NET_IFS\[0\]/ { print $NF }'
  Otherwise network errors such as connectivity failures and incorrect traffic routing might occur.

There is a step in this procedure that will cause the system to reboot, with access only from the Azure Serial Console until the machine is rebooted for the second time. Before starting, make sure you have access to the Azure Serial Console.

Procedure

Before you proceed, read and understand Prerequisites - this may prevent unexpected failures.

Create a Virtual Machine
For additional help, refer to the Microsoft Azure official documentation - Create a Linux virtual machine in the Azure portal
1. Log in to your Azure portal.
2. Navigate to Marketplace and create a VM.
3. Under the Basics tab, enter/select the following information:
  - Subscription - your Azure subscription name
  - Resource Group - your Azure resource group name
  - Virtual Machine Name - a name of your choice to identify the VM e.g. myVirtualMachine-0
  - Region - select from the list
  - Image - select one of the following from the list:
    
    Red Hat Enterprise Linux 8.10
    
    Rocky Linux 8.10
    
    The image must include Logical Volume Management (LVM), which is required to enlarge the volumes.
  - Size - select CPUs and memory as advised by your account team
  - Authentication type - select SSH public key
  - User name - enter azureuser
  - SSH public key source - select generate new key pair
  - Key pair name - use the default name provided
4. Under the Disks tab, enter/select the following information
  - OS disk type - select Premium SSD
  - Encryption - default
5. Select Create and attach a new disk.
6. Enter the disk details:
  - Enter a Name of your choice.
  - Set Host caching to Read/write.
  - Select a storage type Take into consideration that TOS requires 7,500 IOPS and the throughput expected will average 250MB/s with bursts of up to 700MB/s.
  - Set Storage Size GiB to the disk space sizing requirement given to you by your account team.
7. Save the disk defintions.
8. If you want a public IP:
  - Select the Networking tab
  - Under Public IP, click Create new.
  - Enter a name.
  - Select SKU - Standard.
9. Under the Tags tab, create two tags, replacing <your name> and <your environment name> with names of your choice:
  - owner: <your name>
  - env: <your environment name>
10. Click Review and Create
11. Click Create. The generate new key pair prompt appears.
12. Generate a new key pair. Click Download private key and create resource. The private key will be downloaded to your PC as a file with suffix .pem. It will be needed to log into the VM console.
13. Navigate to the directory on your PC that contains the .pem file just downloaded (<pem_key_name>) and change its permissions to prevent other users from running it.
  
  If your PC is running on a Linux-like operating system:
  [<ADMIN> ~]# chmod 400 <pem_key_name>
  
  chmod 400 <pem_key_name>
14. You can now log in to the VM console whenever required.
  
  Log in to the Azure VM console where <pem_key_name> is the name of the .pem file downloaded previously from the Azure portal, <azureuser> is the name of your Azure user on the VM and <IP> is its private or public IP. The private and optional public IPs can be seen on the Networking tab.
  [<ADMIN> ~]# ssh -i <pem_key_name> <azureuser>@<IP>
  
  ssh -i <pem_key_name> <azureuser>@<IP>
15. In the portal, select the newly created VM and then select the Networking tab.
16. Select Add inbound port rule and create a rule with the properties below:
  - Source: Any
  - Source port ranges: *
  - Destination: Any
  - Destination port ranges: 31443, 31617, 31514, 31099, 31843, 30161, 30514, 31161 - see Ports and Services to see why
  - Protocol: Any
  - Action: Allow
  - Priority: 310
  - Name: TOS_Aurora or other name of your choice
  - Description: TOS_Aurora or other description of your choice
Add the VM to the Load Balancer
For additional help, refer to the MicrosoftAzure official documentation - Azure Load Balancer portal settings
1. In your Azure portal, navigate to Load Balancers.
2. Select the load balancer created previously (when the VM for the primary data node was set up).
3. Navigate to backend pools.
4. Select the backend pool created previously (when the VM for the primary data node was set up).
5. Add the new VM to the list of virtual machines in the backend pool.
6. Save.
Configure Partitions.

If not done already, set up partitions according to the Prerequisites.
Configure the Operating System.
1. If you are not currently logged in as user root, do so now.
  [<ADMIN> ~]$ su -
  
  su -
2. If you want to change the host name or IP of the machine, do so now. Once TOS has been installed, changing the host name or IP address will require reinstalling - see Changing IP Address/Host Names. To change the host name, use the command below, replacing <mynode> with your preferred name.
  [<ADMIN> ~]# hostnamectl set-hostname <mynode>
  
  hostnamectl set-hostname <mynode>
3. Modify the environment path to run TOS CLI commands without specifying the full path (/usr/local/bin/tos).
  [<ADMIN> ~]# echo 'export PATH="${PATH}:/usr/local/bin"' | sudo tee -a /root/.bashrc > /dev/null
  
  echo 'export PATH="${PATH}:/usr/local/bin"' | sudo tee -a /root/.bashrc > /dev/null
4. Synchronize your machine time with a trusted NTP server. Follow the steps in Configuring NTP Using Chrony.
5. Configure the server timezone.
  [<ADMIN> ~]# timedatectl set-timezone <timezone>
  
  timedatectl set-timezone <timezone>
  where <timezone> is in the format Area/Location. Examples: America/Jamaica, Hongkong, GMT, Europe/Prague. List the time-zone formats that can be used in the command.
  [<ADMIN> ~]# timedatectl list-timezones
  
  timedatectl list-timezones
6. Upgrade the kernel:
  [<ADMIN> ~]# dnf upgrade
  
  dnf upgrade
7. Reboot the machine and log in.
8. Install Wireguard. This is needed to encrypt communication between nodes (machines) within the cluster. The wireguard version must match the operating version you are installing.
9. Reboot the machine and log in.
10. Install tmux and rsync:
  [<ADMIN> ~]# dnf install -y rsync tmux
  
  dnf install -y rsync tmux
11. Disable the firewall:
  [<ADMIN> ~]# systemctl stop firewalld
  
  systemctl stop firewalld
  [<ADMIN> ~]# systemctl disable firewalld
  
  systemctl disable firewalld
12. Create the TOS load module configuration file /etc/modules-load.d/tufin.conf. Example using vi:
  [<ADMIN> ~]# vi /etc/modules-load.d/tufin.conf
  
  vi /etc/modules-load.d/tufin.conf
13. Specify the modules to be loaded by adding the following lines to the configuration file created in the previous step. The modules will then be loaded automatically on boot.
  br_netfilter wireguard overlay ebtables ebtable_filter
  
  br_netfilter wireguard overlay ebtables ebtable_filter
14. Load the above modules now:
  [<ADMIN> ~]# cat /etc/modules-load.d/tufin.conf |xargs modprobe -a
  
  cat /etc/modules-load.d/tufin.conf |xargs modprobe -a
  Look carefully at the output to confirm all modules loaded correctly; an error message will be issued for any modules that failed to load.
15. Check that Wireguard has loaded correctly.
  [<ADMIN> ~]# lsmod |grep wireguard
  
  lsmod |grep wireguard
  The output will appear something like this:
```
wireguard              201106  0
ip6_udp_tunnel         12755  1 wireguard
udp_tunnel             14423  1 wireguard
```
  If Wireguard is not listed in the output, contact support.
16. Create the TOS kernel configuration file /etc/sysctl.d/tufin.conf. Example using vi:
  [<ADMIN> ~]# vi /etc/sysctl.d/tufin.conf
  
  vi /etc/sysctl.d/tufin.conf
17. Specify the kernel settings to be made by adding the following lines to the configuration file created in the previous step. The settings will then be applied on boot.
  net.bridge.bridge-nf-call-iptables = 1 fs.inotify.max_user_watches = 1048576 fs.inotify.max_user_instances = 10000 net.ipv4.ip_forward = 1
  
  net.bridge.bridge-nf-call-iptables = 1 fs.inotify.max_user_watches = 1048576 fs.inotify.max_user_instances = 10000 net.ipv4.ip_forward = 1
18. Apply the above kernel settings now:
  [<ADMIN> ~]# sysctl --system
  
  sysctl --system
For maximum security, we recommend only installing official security updates and security patches for your Linux distribution, as well as the RPMs specifically mentioned in this section.
Add the Node to the Cluster.
1. Log in to the primary data node.
2. On the primary data node:
  [<ADMIN> ~]$ sudo tos cluster node add --role=worker
  
  sudo tos cluster node add --role=worker
  On completion, a new command string is displayed, which you will need to run on the new node within 30 minutes. If the allocated time expires, you will need to repeat the current step.
3. Copy the command string to the clipboard.
4. Log in to the new node.
5. On the new node, paste the command string copied previously and run it. If the allocated time has expired, you will need to start from the beginning.
6. Verify that the node was added by running sudo tos cluster node list on the primary data node.

Check the TOS status.

On the primary data node, check the TOS status.
```
[<ADMIN> ~]$ sudo tos status
```
sudo tos status
In the output, check if the System Status is Ok and all the items listed under Components appear as Ok. If this is not the case, contact Tufin Support.

Example output for a central cluster data node:

[<ADMIN> ~]$ tos status         
[Mar 28 13:42:09]  INFO Checking cluster health status           
TOS Aurora
Tos Version: 24.2 (PRC1.1.0)

System Status: "Ok"
            
Cluster Status:
   Status: "Ok"
   Mode: "Multi Node"

Nodes
  Nodes:
  - ["node1"]
    Type: "Primary"
    Status: "Ok"
    Disk usage:
    - ["/opt"]
      Status: "Ok"
      Usage: 19%
  - ["node3"]
    Type: "Worker Node"
    Status: "Ok"
    Disk usage:
    - ["/opt"]
      Status: "Ok"
      Usage: 4%

registry
  Expiration ETA: 819 days
  Status: "Ok"

Infra
Databases:
- ["cassandra"]
  Status: "Ok"
- ["kafka"]
  Status: "Ok"
- ["mongodb"]
  Status: "Ok"
- ["mongodb_sc"]
  Status: "Ok"
- ["ongDb"]
  Status: "Ok"
- ["postgres"]
  Status: "Ok"
- ["postgres_sc"]
  Status: "Ok"

Application
Application Services Status OK
Running services 50/50

Remote Clusters
Number Of Remote Clusters: 2
  - ["RC"]
     Connectivity Status:: "OK:"
  - ["RC2"]
     Connectivity Status:: "OK"

  Backup Storage:
  Location: "Local
s3:http://minio.default.svc:9000/velerok8s/restic/default "
  Status: "Ok"
  Latest Backup: 2024-03-23 05:00:34 +0000 UTC

Example output for a remote cluster data node:

[<ADMIN> ~]$ tos status         
[Mar 28 13:42:09]  INFO Checking cluster health status           
TOS Aurora
Tos Version: 24.2 (PRC1.0.0)

System Status: "Ok"
            
Cluster Status:
   Status: "Ok"
   Mode: "Single Node"

Nodes
  Nodes:
  - ["node2"]
    Type: "Primary"
    Status: "Ok"
    Disk usage:
    - ["/opt"]
      Status: "Ok"
      Usage: 19%
  
registry
  Expiration ETA: 819 days
  Status: "Ok"

Infra
Databases:
- ["mongodb"]
  Status: "Ok"
- ["postgres"]
  Status: "Ok"

Application
Application Services Status OK
Running services 16/16

  Backup Storage:
  Location: "Local
s3:http://minio.default.svc:9000/velerok8s/restic/default "
  Status: "Ok"
  Latest Backup: 2024-03-23 05:00:34 +0000 UTC

After the node is added, we recommend stopping tos and then starting it to enhance the node's performance. This will require downtime.