VCF 4.5 – Adding an Edge Cluster to a workload domain

Adding an NSX Edge Cluster to a VCF workload domain brings a huge amount of versatility to your workloads living there. Software defined networks that can be provisioned with full routing when you need them. Security, for workloads on segments as well as those on traditional portgroups through both the Distributed and Gateway Firewalls. With VCF it’s easy to get started.

For this post we’ll focus on an automated deployment done with VLC. This will allow us a standard known configuration, but the concepts and information can apply to almost any deployment.

SDDC Manager has a UI to deploy an Edge Cluster as well as an API, we’ll cover the UI in this post, but once you have all the settings it won’t be hard to populate the JSON for an API call. Since we are focused on deploying to a workload domain the assumption is that you already have VCF up and running with both a management and a workload domain. With VLC this can be as easy as running a powershell script and checking a few boxes, in a couple of hours you’ll be at the starting place for this config. Enough of that, let’s go!

Things you’ll need

Convenient list in the UI!

While this might seem like a daunting list, it’s not too bad, let’s take it piece by piece, then we’ll fill out the form in the UI:

  • Separate VLANs and subnets for host TEP VLAN and Edge TEP VLAN use
    • Good news! We already have separate subnets for host TEPs and Edge TEPs. In the default automated configuration of VCF with VLC these are configured as follows:
      • Host TEPs: VLAN 10 – Subnet 172.16.254.x/24 (DHCP)
      • Edge TEPs: VLAN 13 – Subnet 172.27.13.x/24
  • Host TEP VLAN and Edge TEP VLAN need to be routed
    • VLAN 10 and 13 are able to route to each other through Cloudbuilder which VLC setup as the router
  • If dynamic routing is desired, please set up two BGP peers (on TORs or infra ESX) with an interface IP, ASN and BGP password
    • Either of these is possible, and while we have Gobgp running on Cloudbuilder it’s only a route “server”, meaning it will broadcast routes but won’t add learned routes to the kernel routing table. This means we’ll also need to add a static route to Cloudbuilder for the segments we build in NSX.
    • For this exercise we’ll setup BGP peers in GoBGP and add the a summary route for our SDN’s
  • Reserve an ASN to user for the NSX Edge Cluster’s Tier-0 interfaces
    • ASN’s for private use are between 64512 and 65534, the current ASN’s VLC uses are:
      • 65001 – This is for GoBGP on Cloudbuilder
      • 65003 – This is the NSX Tier-0 router deployed for the edge cluster in the management domain
    • Let’s choose 65005 for the new workload domain edge cluster
  • DNS Entries for NSX Edge Components should be populated in customer managed DNS Server
    • During Cloudbuilder deployment customization, VLC pre-populated the following
      • edge1-wld.vcf.sddc.lab. FQDN4
      • edge2-wld.vcf.sddc.lab. FQDN4
    • Note: The DNS entries above are *only* the management IP, these edges will have several other IP addresses that do not need to be in DNS.
  • The vSphere clusters hosting the Edge clusters should be L2 Uniform. All host nodes in a hosting vSphere cluster need to have identical management, uplink, Edge and host TEP networks
    • This is the way that VLC deploys the workload domain by default
  • The vSphere clusters hosting the NSX Edge node VMs must have the same pNIC speed for NSX enabled VDS uplinks chosen for Edge overlay (e.g., either 10G or 25G but not both)
    • VLC deploys all 10Gb links via the VMXNET3 NICs on the nested ESXi hosts
  • All nodes of an NSX Edge cluster must use the same set of NSX enabled VDS uplinks. The selected uplinks must be prepared for overlay use
    • These are prepared by VCF

Lets put together a quick reference chart for the settings our edges will have, this will help populate the UI form and the API JSON.

Mgmt IP10.0.0.5510.0.0.56
Uplink 1 VLAN1111
Uplink 1 IP172.27.11.6172.27.11.7
Uplink 2 VLAN1212
Uplink 2 IP172.27.12.6172.27.12.7
Edge TEP VLAN1313
Edge TEP 1 IP172.27.13.6172.27.13.8
Edge TEP 2 IP172.27.13.7172.27.13.9

You can see in the list above there are only a couple of things we need to do; setup our BGP neighbors and add a summary route on Cloudbuilder. To setup the BGP neighbors use ssh to log in to Cloudbuilder with the user admin and the password set during deployment. Then enter "sudo su -” to become root.

Next you’ll edit the /usr/bin/gobgpd.conf file and add the neighbor entries for our new Edge cluster nodes. Typically I find it easy to duplicate everything starting with [[neighbors]] through the end of the file. Then you can change the ASN and IP addresses for the duplicated entries to match the table above.

You can see the duplicated and modified neighbors in the red box above. Once you have saved this file the next step is to restart gobgpd and then check to make sure it knows about the new neighbors.

Enter "systemctl restart gobgpd"

Then “gobgp neighbor", and you should see the following

Next, it’s time to enter our summary route. A summary route is just an easy way to say this large block of networks are this way! We already have one by default with VLC and that’s the network which can be parsed into thousands of smaller subnets inside NSX. For our workload domain edge cluster let’s choose Due to the way the edges are configured we’ll need to add this route to Cloudbuilder in a special way so that it can use multiple paths to get there. We’ll do this by editing the /etc/systemd/system/ethMTU.service file. This file was created by VLC and runs at startup to perform custom network settings for the nested environment.

When you open the file, find the line with the subnet and duplicate it, then change the subnet to and the edge uplink IP’s to our new edge cluster IP’s

After you save this file reboot Cloudbuilder by entering "reboot". Once Cloudbuilder comes back online log in using admin, “sudo su -” to become root and enter “ip route“, looking for the new route to

At this point we’re ready to begin going through the SDDC Manager Add Edge Cluster workflow. Log in to SDDC Manager and select Workload Domains under Inventory. Then click on the (3) dots to the left of the WLD-01 domain and select Add Edge Cluster.

Since we’ve already been through the list, go ahead and check the “Select All” box and click Begin

Now we fill in the form with basic information, one thing to note on this first tab is that MTU should be 1600 or greater due to the overhead of encapsulation.

For this next screen I’m going to use a custom config, my Edge Form Factor will be small as I just want to route traffic and do some security in the nested lab, this is fine for proof of concept. If you are going to use more than one load balancer you’d need to select a large edge. (This is the case in the management domain when vRealize is deployed). We’ll select Active-Active as that’s how our routing is setup and put in our ASN number that we chose above.

The Edge Node tab can be a bit tricky, at least for me, with all the different IP addresses so be extra careful when putting them in to ensure success! You will need to fill this out twice, once for each node. It starts off easy enough with the FQDN, Management IP and Edge TEP IP information, this is all located in our table above

Scrolling down we need to enter the uplink IP information for both uplinks as well as our BGP peer information, the uplink info is located in the table above and the BGP peer information will be the IP address of Cloudbuilder on the VLANs 11-13. They are set to, and You might have seen the BGP Peer ASN in the GoBGP config file but if not it’s 65001.

Once everything is entered for this node, click the ADD EDGE NODE button and it will populate in the table below the button, then you’ll need to click the Add more edge nodes button below the table, or the easy way is to just edit the FQDN and IP addresses and click ADD EDGE NODE again to populate the second edge node in the table.

Once the second edge node has been added you’re able to click next.

This will display a summary screen, however there has been a problem identified with Chrome browsers and the Clarity UI framework, so you might get a black screen as seen below. The workaround is to use Firefox or another browser until this can be addressed.

Click next and then finish once validation is complete, and there is *quite a bit* of validation done, the workflow will kick off deploying your NSX Edge Cluster in the workload domain.

While the workflow is running you can monitor the progress using the tasks pane in SDDC Manager. This typically takes ~45 minutes on my system.

Once it completes you’ll be able to see it under the Edge Clusters tab in the WLD-01 details

Now you’re ready to create segments in NSX manager and secure and route traffic in and out of your new SDNs in your workload domain! Thank you for taking the time to go through this, while there wasn’t much to do, there was plenty of information to consume!

Addendum – Yea, sometimes it fails…

To provide complete transparency, and take advantage of a learning moment, the task showed up as failed the first time I ran it, digging in I could see that it was unable to peer one of the edges (edge2-wld in this case) with Cloudbuilder.

Looking at the subtasks we can see more info

Then clicking anywhere near “Progress messages” we get even more info

I logged in the to NSX Manager for the workload domain to see if I could find more information, however when looking at the Tier-0 BGP sessions, everything looked fine, including the one above, so perhaps I fat fingered an IP somewhere..

I decided to retry the operation and click Restart Task in the SDDC Manager Tasks window.. and then quickly realized this was my fault, and I had entered the incorrect Uplink IP addresses for Edge2-wld in the gobgp neighbor config! After correcting this and restarting gobgp the workflow completed successfully! Below is the incorrect settings, the uplink IPs should have been ending with .6 and .7 per our table!

I hope this helps and thank you for reading!