VCF Infrastructure Disaster Protection (Yeah, backups)

Never underestimate the bandwidth of a station wagon filled with backup tapes.

Words to live by, and as the density of those backup tapes continues to increase (Last time I used backup tapes they were 400/800GB LTO3’s), so does the station wagons bandwidth! I’ll leave it up to you to get your tapes to the vault but let’s go through what it takes to get the pertinent VCF infrastructure configuration data into a state and place where it can be backed up and restored. This will include configuration of the backups for SDDC Manager, vCenters, Exporting VDS configs, and NSX Managers, lets go!

Prerequisites
  • VM or physical machine with a Linux or Windows OS that hosts the SFTP server required to protect the SDDC Manager, vCenter and NSX Manager instances. This computer may also be used to coordinate and support a restore operation.
  • Reliable and secure storage volume on which the backup files are stored. The computer and the storage need to be in a different fault domain and be backed up regularly.
Process

The configuration process below would only be a portion of a wholistic Enterprise Backup Solution. What we will configure will protect the individual vCenters, SDDC Manager and NSX components.

Sometimes, you can recover localized logical failures by restoring individual components. In more severe cases, such as a complete and irretrievable hardware failure, to restore the operational status of your SDDC, you must perform a complex set of manual deployments and restore sequences. In failure scenarios where there is a risk of data loss, there has already been data loss or where it involves a catastrophic failure, contact VMware Support to review your recovery plan before taking any steps to remediate the situation.

https://docs.vmware.com/en/VMware-Cloud-Foundation/4.4/vcf-admin/GUID-D7EAB1E5-6F6B-4F14-9438-1F963C742F05.html

You’ve probably noticed the persistent orange bar on your main VCF Dashboard stating that your backup settings are the default and you should register an external SFTP server. It even has a link/button on the right side to help you get to those settings faster.

You can also get to the settings by navigating to Administration -> Backup in the left hand nav menus. Once there you’ll be placed in the Site Settings tab and will need to complete and save this information before moving on to schedule the backup and retention times.

Above, I’ve filled in the information required. In this case I’m using Cloudbuilder’s information in my lab as it was convenient, but you would want a solid SFTP server that is backed up to tape/cloud storage for production. A couple of things to note are:

  • Once you fill in the FQDN or IP address and select the next field, SDDC Manager will automatically look up and populate the SSH Fingerprint field, there is even a helpful tip to help you find or validate the fingerprint, although I have no idea how you copy it out of that popup.

I’ve typed out the command below, simply replace PORT with the port number and IP_ADDRESS with the IP address of the backup server.

ssh-keygen -lf <(ssh-keyscan -p PORT -t rsa IP_ADDRESS 2>/dev/null) | cut -d ' ' -f2

This will return something similar to this:

SHA256:zn3jyl9h1fJcv4t7pS01XcddNTrX05TEMCjCT4+AoWo
  • Ensure that the Username you specify exists and has access to the Backup Directory on the backup server.
    • Ensure the user has permissions to create a new directory as well as SDDC Manager will create a “sddc-manager-backup” directory and NSX will create a “cluster-node-backups” directory in your configured backup directory
  • There is an easily missed (at least by me) Confirm Fingerprint checkbox, make sure you put a check in it!
  • The *required* Encryption Password does not have any documented complexity rules that I could find. Through the greatest teacher (trial and error) I have concluded the following:
    • Password requires minimum 2 Upper Case characters
    • Password requires minimum 2 digits
    • Password requires minimum 1 special character
    • Password requires minimum 12 characters

After you’ve entered all the information the SAVE button will light up and you’ll be able to save your configuration. SDDC Manager will then pop up a confirmation window and once CONFRM is clicked it will configure itself, as well as all the NSX Managers (in the case of multiple WLD’s this can be handy) with the backup settings.

You’ll be able to track the progress by looking in the tasks and expanding out the Configure Backup of VCF Components task. It took less than a minute for mine to run including having a workload domain.

Once that is complete, you’ll be able to navigate to the Administration -> Backup -> SDDC Manager Configurations tab. Here you’ll be able to setup schedules and retention for SDDC Manager along with an option to do an ad hoc backup automatically whenever a task is completed in SDDC Manager. You can select either Hourly or Weekly backups and then set the time / days accordingly.

Retention is simply how many of each type of backup is retained. Zero is a valid number for any of these retention fields, I have 5 setup for my Retain Last Backups field and 0 for the hourly and daily. With that setting it will only keep 5 backups of any type.

Once your settings are configured and saved click on the BACKUP NOW button to test it out. You will see a task in the SDDC Manager Tasks window and after a minute or so you should have a green successful status. You can log in to your backup server and check that there was a directory created, as well as files.

What about NSX manager scheduling and retention though? Though SDDC Manager configures the NSX backup, they are setup as a default of 1 hour, this can be changed, however if at some point you reconfigure the backups through SDDC Manager what you have changed in the NSX Manager will be reverted to the default 1 hour backup. You should strive to have all your backups happen in the same window of time so that you have a consistent state across all the components. Once your settings are complete and saved be sure to test by clicking Backup now.


There is no retention setting in the NSX UI, you will need a way to manage aging out these backup files. There is a script that can be copied from NSX manager onto your backup server to manage the backup retention by scheduling it to run with the backup servers crontab or task scheduler. However, the requirement for this script is to have a backup directory which only has certain folders in it.

You can enter the shell on NSX manager and SCP it over to your backup server. Once it’s copied over keep in mind it’s a python script so you’ll need python on your backup server if it’s not already there.

# The purpose of this script is to remove old NSX backup files. Typically, this script
# will be placed on the SFTP server where the NSX Manager is uploading backup files,
# and included into a scheduler, for example cron.  Before running this script, you
# should update the BACKUP_ROOT variable.  This script works on Linux and Windows with
# both Python 2 and Python 3.
#
# On Linux SFTP server:
# You can add this script in the crontab to automatically run this script once daily
# Edit the anacron at /etc/cron.d or use crontab -e and add following line to execute the script at 10am everyday
# 00 10 * * * /sbin/nsx_backup_cleaner.py
#
# On Windows SFTP server:
# schtasks /Create /SC DAILY /TN PythonTask /TR "PATH_TO_PYTHON_EXE PATH_TO_PYTHON_SCRIPT"
# or you can add the same in TaskScheduler

At this point you’ll need to create a subdirectory in your backup folder that was configured in SDDC Manager. I added an nsx_backups directory to mine. Then I reconfigured my NSX Managers to point at that new directory. Lastly, I added the nsx_backup_cleaner.py to my crontab per the instructions above pointing at the newly created directory with retention settings the same as what I’d configured in SDDC Manager.

#Manages retention for all NSX Manager backups

00 10 * * * /sbin/nsx_backup_cleaner.py --dir /home/admin/nsx_backups --retention-period 1 --min-count 5

Next we’ll configure the vCenter server(s) to backup to the same location we told SDDC Manager to backup to. Login to the Appliance management on your vCenter servers (https://vC-FQDN:5480) with the root credentials and click on the Backup link.

Fill in the necessary information, ensuring you are maintaining the same backup window and retention settings as you did when configuring SDDC Manager and NSX Manager. After you click CREATE, you should click the BACKUP NOW link above activity to ensure everything is working as it should.

Finally, on to the last step of exporting your VDS configs. We’ll start by logging in to the vCenter UI. Then you’ll navigate to the Inventory -> Networking -> VDS and expand out the Management Networks folder to see the VDS(s). Right click on a VDS and select Settings -> Export config. Select the Distributed Switch and all port groups radio button and save the zip file locally adding the current date to the name of the zip file, or if you are able, save directly to the backup server. Repeat for each VDS. If you can’t save directly to the backup server ensure that you copy the zip file to the backup server as soon as possible. Keeping all backups in the same place will make your job easier during a restore.

Conclusion

At this point we have completed configuring the backups for our critical infrastructure pieces including SDDC Manager, our vCenters and our NSX managers. We have also configured backup retention across all of those pieces as well. I realize this was quite a long post, and there are several improvements that can be made to this process. Truth be told, there are API’s and/or powershell cmdlet’s for almost every single step in this blog post, one could use those to automate the majority of the process. The only piece that needs manual intervention would be copying the NSX backup cleanup script. This would be a great way to keep everything in sync with regards to date/time and locations!