Upgrade TOS Aurora

Overview

This procedure is for upgrading TOS to R25-1 and it is identical for all platforms and operating systems.

Before starting, create a backup and export it to an external location in case you need to roll back. After the upgrade completes successfully, make a new backup as previous backups made on one product version cannot be restored to another.

For all information on this release. including new features, resolved and known issues, EOL announcement and additional information, see the R24-2 Release Notes.

For all other installation and upgrade options, see the appropriate procedure in the table of contents.

How Should I Upgrade My Deployment?

Worker Nodes

Only the primary data node needs to be upgraded. It will automatically upgrade TOS on all other worker and data nodes in the same cluster. The TOS CLI will be upgraded on the other nodes when you next run a TOS CLI command on them.

Remote Clusters

All clusters need to be running the same TOS version. Therefore, make sure to upgrade the primary data node in both the central cluster and remote clusters. Upgrade the central cluster first.

High Availability (HA)

If you are upgrading a high availability deployment, you are going to need to prepare the other data nodes before upgrading TOS. This will require logging into them separately in a different session.

Disaster Recovery (DR)

If you have disaster recovery, first upgrade the active deployment and then upgrade the standby deployment.

Prerequisites

TOS Compatibility and Upgrade Paths

  1. Make sure your current version can be upgraded directly to this version of TOS Aurora - see TOS Release History and Upgrade Paths
  2. If you are running NFS 3 on your backup server it will not work because of a security vulnerability. If you want to ignore the security vulnerability to enable NFS 3, you need to run the following commands on all TOS servers that are using TufinOS 4.20 and later:

    systemctl unmask rpcbind.socket rpcbind.service
    systemctl unmask rpcbind.socket rpcbind.service
    systemctl start rpcbind.socket rpcbind.service
    systemctl start rpcbind.socket rpcbind.service
    systemctl enable rpcbind.socket rpcbind.service
    systemctl enable rpcbind.socket rpcbind.service

     

Port and Services

  1. If your deployment incorporates remote clusters and you are upgrading from a release lower than R23-1, be aware that an additional port 9090 is now required for successful running of TOS - see remote collector ports.

Downloads

  • Download the TOS R25-1 PGA.0.0 installation package from the Download Center.

  • The downloaded files are in .tgz format <FILENAME>.tgz.

Required Steps Before Starting

  1. Run the command tos status. In the output, make sure system status is "OK", all nodes are "healthy" and under "Disk usage" /opt is not more than 70%. If any of these conditions are not met, the upgrade will fail.

  2. Make sure you have at least 25 GB free on the primary data node in the /tmp directory.

  3. If you monitor devices managed by a management device/domain that does not have a dedicated license because it inherits its license status from its monitored devices/domains e.g. FMC, FMG, Panorama, make sure all such monitored devices/domains are licensed or removed. Failure to do this will cause the management device/domain to be unlicensed after the upgrade.

  4. If you are upgrading a remote collector cluster:

    • Do not start the upgrade until the upgrade to the central cluster has completed.

    • It must run it under the same release as the central cluster.

  5. You must have a valid license before starting the upgrade, otherwise the procedure will abort.

    1. Select Admin > Licenses.

    2. The License Management page appears.

    3. If your license has expired, or if there is no license uploaded, upload a valid license. For more information see, Uploading License Files to TOS (Solution Tiers)

  6. Create a backup of the installation file that was used for your current TOS Aurora installation - /opt/tos/tos.tar - to a directory outside of /opt/tos This is necessary in case there is a need to roll back.

  7. Create a backup of your TOS Aurora data (see One-Time Backup Procedure).

  8. If you use automated provisioning, make sure there are no queued provisioning tasks. You can check this using the waiting_tasks API.

  9. Transfer the run file to the primary data node to directory /opt/tufin/data.

  10. Extract the TOS run file from its archive.

    [<ADMIN> ~]$ tar -xvzf tos-xxxx-xxxxxxxx-final-xxxx.run.tgz
    tar -xvzf tos-xxxx-xxxxxxxx-final-xxxx.run.tgz
  11. Unpack the CLI of the new TOS version.

    [<ADMIN> ~]# sh <runfile>
    sh <runfile>
  12. Perform a pre-check. The pre-check performs all the necessary validations required for upgrading TOS. If an issue is encountered, the pre-check stops and the issue is printed to the output.

    [<ADMIN> ~]# tos upgrade pre-check
    tos upgrade pre-check

    If the output returns an issue, do one of the following:

    • Contact Tufin Support to fix the issue and then repeat this step.

    • Go back to your old CLI version, and perform the upgrade later.

      [<ADMIN> ~]# sh <old runfile>
      sh <old runfile>

      Where <old runfile> is the runfile for the TOS version you currently have installed.

    If the issue relates to preparing the other data nodes for an HA deployment, ignore it. This will be covered later in the procedure.

  13. Optional. Run the upgrade planner to view all the steps that are going to be performed during the upgrade process. On some of the steps, if the upgrade fails, you will have the option of aborting the upgrade and going back to your old TOS version, or continuing after you fix the problem.

    These steps will have the label: upgrade.tufin.com/abort-allowed: "true"

    [<ADMIN> ~]# tos upgrade show
    tos upgrade show
  14. See the R25-1 Pre-Installation Information in the Release Notes

Upgrade Procedure

  1. Log in to the primary data node using SSH as user tufin-admin or another user with sudo or root privileges.

  2. Check your current version by running the following command:

    [<ADMIN> ~]# tos version
    tos version
  3. Check that your cluster status is healthy.

    1. Run the following command on the primary data node:

      [<ADMIN> ~]# systemctl status k3s
      systemctl status k3s

      Example Output

      [primary data node]# systemctl status k3s
      [root@TufinOS ~]# systemctl status k3s
      Redirecting to /bin/systemctl status k3s.service
       k3s.service - Aurora Kubernetes
         Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: disabled)
         Active: active (running) since Tue 2021-08-24 17:14:38 IDT; 1 day 18h ago
           Docs: https://k3s.io
        Process: 1241 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
        Process: 1226 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
       Main PID: 1250 (k3s-server)
          Tasks: 1042
         Memory: 2.3G
    2. In the output under the line k3s.service - Aurora Kubernetes, check that two lines appear - Loaded... and Active... similar to the example above. If they appear, continue with the next step, otherwise contact Tufin Support for assistance.

  4. Make sure all users are logged out from the browser.

  5. Make a one-time backup.

  6. After your backup has completed, run the following command:

    [<ADMIN> ~]# screen -S upgrade
    tmux new-session -s upgrade
  7. Upgrade TOS:

    [<ADMIN> ~]# tos upgrade
    tos upgrade

    When the command completes, you will again be able to run any TOS CLI command.

  8. If the upgrade returns an error, see Upgrade Errors.

  9. Verify.

    Check again the tos version as described in upgrade procedure step 2 above. Make sure that the version displayed is the one to which you intended to update.

    [<ADMIN> ~]# tos version
    tos version

    Check again the cluster status.

    [<ADMIN> ~]# systemctl status k3s
    systemctl status k3s

    Example Output

    [primary data node]# systemctl status k3s
    [root@TufinOS ~]# systemctl status k3s
    Redirecting to /bin/systemctl status k3s.service
     k3s.service - Aurora Kubernetes
       Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: disabled)
       Active: active (running) since Tue 2021-08-24 17:14:38 IDT; 1 day 18h ago
         Docs: https://k3s.io
      Process: 1241 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
      Process: 1226 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
     Main PID: 1250 (k3s-server)
        Tasks: 1042
       Memory: 2.3G

    In the output under the line k3s.service - Aurora Kubernetes, check that two lines appear - Loaded... and Active... similar to the example above. If they appear, continue with the next step, otherwise contact Tufin Support for assistance.

  10. HA only. Copy the TOS CLI to all data nodes.

    On the primary data node, copy TOS from /usr/local/bin/ to the same location on the other data nodes.

    rsync -avhe ssh /usr/local/bin/tos <user>@<non-primary data node>:/usr/local/bin/tos --rsync-path="sudo rsync"
    rsync -avhe ssh /usr/local/bin/tos <user>@<non-primary data node>:/usr/local/bin/tos --rsync-path="sudo rsync"
  11. Where <user> is the user on the data node you are connecting with and <non-primary data node> is the IP address of the non-primary data node.

  12. Make a new backup.

    Before allowing users to start work, make a new one-time backup. This is necessary because the data schemas have been modified and any backups made before the upgrade can no longer be restored to the new version of the product. See Backup Procedure.

  13. If you monitor FortiManager devices, add a SAN signed certificate to each device.

  14. To enable automatic license usage reporting, requests from user browsers to the sub-domain aus.tufin.com must be allowed. For more information, see Sending License Usage Reports Automatically.

  15. Enabled by default, system information is sent periodically to Tufin Support for the purpose of troubleshooting and identifying performance issues. It can be disabled (see Sending Cluster Health Status). The information includes:

    • DB status and size

    • Backup status

    • Kubernetes status and metrics

    • CPU metrics

    • Memory status

    • I/O

    • Configuration changes

    • TOS status

    • Cluster performance

    It does not include IP addresses, personal user information, or device information. All the information sent is encrypted and is accessible only to Tufin support teams.

    The information is sent to Tufin from TOS users' browsers to the Tufin sub-domain mailbox.tufin.com, therefore requests from user browsers to this sub-domain must be allowed.

     

  16. Make sure users clear their browser cache.

  17. Reactive your license if necessary.

    In some cases, particularly when hardware is changed, license validity gets lost in the upgrade process. If activation is lost, this will not limit the functionality of TOS Aurora but future upgrades will not be possible until the license is reactivated.

    • Check the status - go to Admin > Licenses. The License window appears.

    • If the status shown in the window is anything other than Activated, follow the instructions in Activate License.

Upgrade Errors

The TOS upgrade has several steps on which a validation is performed to identify issues that can impact the upgrade and prevent it from completing (for example: missing license file, missing backup file, node stability). If an issue is detected, the output returns an error with detailed information and instructions on how to proceed.

These instructions include three possible courses of actions:

  • Continue the upgrade: Contact Tufin support to resolve the issue. After resolving the issue, continue the upgrade from the point where it stopped. Continue is not available for all steps.

    [<ADMIN> ~]# tos upgrade continue
    tos upgrade continue
  • Abort the upgrade: Abort the upgrade and revert back to the old CLI. Abort is not available for all steps.

    [<ADMIN> ~]# tos upgrade abort
    tos upgrade abort
  • Contact customer support: This is relevant for errors where you do not have the option of continuing or aborting.