Switches, we don’t need no stinkin’ switches…. Well, yes we still need switches. That being said we no longer require specific switches in VMware Cloud Foundation 3.0. This article is not a replacement to the User Guide or the Release Notes.
WHAT DOES THAT MEAN???
In VMware Cloud Foundation 3.0 (VCF 3.0), we have moved to a Bring Your Own Network (BYON) model to allow customers to choose their favorite network vendor and switch model and provide the underlay network that VCF will use. This will allows customers the flexibility to use the fabric manager of choice to control their underlay network.
THIS SOUNDS GREAT! MY NETWORKING GUY WILL BE SO HAPPY! WHAT DO I NEED TO TELL ASK HIM NICELY TO CONFIGURE?
Now that BYON is in place and you have selected your switching vendor of choice, or incorporated to your existing network topology there will be a few pre-requisites that are required from your networking team. Those pre-requisites are as follows:
- Each port to be used by a physical server will need to be a trunk port, however that trunk port can be limited to the specific VLAN’s as required by VCF and your vSphere environment
- Creation of a management network to support the SDDC Manager, PSC’s, vCenters, NSX Managers, NSX Controllers, and vRealize LogInsight cluster, in addition to additional vCenters, NSX Managers, and NSX Controllers for additional workload domains, as well as any additional services to be deployed. This will be the default network used for all of these services when Cloud Builder brings up the environment. (more on that in a later post)
- While not explicitly required an ESXi management network is also nice to have assuming it has a route to the above management network. This way your physical hosts are logically separated on another vLAN in your environment. Your hosts management (in band) IP’s can also reside on your management network above (optional)
- VCF 3.0 introduces a new construct called a Network Pool. A Network Pool simply consists of two networks ensuring isolation of your vMotion and vSAN traffic in your environment. In VCF 3.0 we also support multi cluster workload domains, therefore plan accordingly if those should have shared or segregated vMotion and vSAN traffic.
- vMotion network per Workload Domain, this is due to ensure isolation within workload domains or clusters. Please keep in mind this is not a requirement but rather a best practice in VCF.
- vSAN network per Workload domain, this is due to ensure data isolation within workload domains or clusters. Please keep in mind this is not a requirement but rather a best practice in VCF.
- VTEP network with DHCP provided. Please keep in mind that in VCF 3.0 we configure a Load Based Teaming NIC policy. This in turn requires two (2) IP addresses per physical host in the environment.
- While not explicitly required an network allowing out-of-band access to the BMC ports of the servers is also nice to have assuming it can route to the above management network. This allows you to access the Baseboard Management Controller (iLo, iDrac, IMC, ect) as needed remotely. (optional)
- When planning all of your IP requirements allocate a subnet with ample capacity and then use inclusion ranges to limit the use of that subnet. Keep in mind when using network pools an overlapping subnet cannot be reutilized (i.e. 192.168.5.0/24 and 192.168.5.128/25)
- Ensure that your networking guy is not utilizing any ethernet trunking technology (LAG/VPC/LACP) for connection to ESXi servers.
- Example of this schema in action:
As you can see in this example I would be sharing the management network for ESXi servers as well as PSC’s, vCenters, NSX Managers, NSX Controllers, and vRealize LogInsight. My additional Workload Domain though would receive it’s own dedicated vMotion and vSAN network pool. Based upon the VXLAN/VTEP DHCP network being a 192.168.5.0/24 I could in theory support up to 126 ESXi hosts in this environment after removing .1 for the gateway and .255 is utilized for broadcast. Also one other thing to note is the management network can be MTU of 1500 or higher. Jumbo Frames (1600+), are required for all other networks and is recommended to be 9000 as illustrated in this graphic. All of the above networks also require gateways, and these need to be routable for the bringup portion of the environment. Lastly VLAN Tagging through 802.1Q should be utilized, Jumbo Frames enabled, and IGMP Snooping Querier.
I HAVE COMPLETED ALL THAT AND GAVE MY NETWORKING GUY COPIOUS AMOUNTS OF COFFEE, WHAT’S NEXT?
Now that the networking has been plumbed to the environment and host ports are set appropriately the next topic is DNS. VCF will now support custom naming of attributes, in fact it manages ZERO of the DNS going forward. In order to ensure your environment is ready for VCF, forward and reverse DNS entries need to be established for every item to be spun up in VCF. Also one note here is that the DNS server to be used needs to be an authoritative DNS server therefore services like unbound will not allow this to function properly. Below is a table illustrating what is required in addition to the ESXi Hosts having appropriate forward and reverse DNS:
OK WE HAVE COMPLETED DNS AND ARE READY TO PROCEED WHAT IS NEXT!
In VCF 3.0 the support matrix for physical servers has been dramatically expanded. In fact we support almost every vSAN ready node from this list. A couple of things to keep in mind:
- Minimum Cache disk size requirement is 200Gb per disk
- Minimum Capacity disk size requirement is 1Tb per disk
- Minimum Two NIC’s of 10GbE, however VCF 3.0 supports 25GbE and 40GbE or any other NIC that has been qualified for ESXi and is IOVP certified. This will allow for the use of 100GbE IOVP NIC’s as well.
- Configure BIOS and FW levels to VMware and manufacturer recommendations for vSAN (NICs are certified on the ESXi qualifying hardware and HBAs are the most important things to confirm and set before proceeding)
- Disable all ethernet ports except the two that will be used in VCF (currently VCF only supports two physical NICs, in addition to BMC, and they must be presented in vSphere as vmnic0 and vmnic1 respectively)
- Install a fresh copy of ESXi 6.5U2c/EP8 with a password defined. Use the vendor build if possible if not patch the VIBs for NICs and disk controllers as stated above.
* If you are using Dell Servers you are in luck as 6.5U2c/EP8 is directly installable with Dell’s branded ISO located here
**If you are using HP Server you are in luck as 6.5U2c/EP8 is directly installable with HP’s branded ISO located here
*** Here is a link to additional custom ISO’s - Enable SSH and NTP to start/stop with host
- Configure DNS server IP on host.
- What you Can (and Cannot) Change in a vSAN Ready Node
- In an all flash configuration your hosts capacity SSD’s need to be marked as capacity. This scrip would be run on each ESX host participating in VCF Chris Mutchler @chrismutchler wrote this script to automate this:
esxcli storage core device list | grep -B 3 -e "Size: 3662830" | grep ^naa > /tmp/capacitydisks; for i in `cat /tmp/capacitydisks`; do esxcli vsan storage tag add -d $i -t capacityFlash; vdq -q -d $i; done
** Please note the Size would need to be updated with the size of your capacity disk
ONE LAST STEP SHOULD I CHOOSE TO CREATE A NEW WORKLOAD DOMAIN OR USE THE NEW MULTI CLUSTER SUPPORT IN VCF 3.0?
In VCF 3.0 as mentioned above, has the ability to instantiate multiple clusters in a single workload domain. To make this easier in the planning phase, coming soon, one should think if multi cluster or another workload domain makes more sense. To discuss this I think we need to lay out the features and use cases for each one to compare and contrast.
Multi Cluster:
- Hardware
- GPU’s for VDI
- Storage dense cluster
- High Memory cluster
- Machine Learning is a great example that may need GPU’s for processing
- Licensing
- Segregation of Web vs SQL components
- Another licensing related component could be OS related, for example a Windows cluster and a Linux cluster (insert your distro of choice).
- Failover Zones
- Ability to create disparate storage for a separate failover zone for an application.
- Patch Consistency
- In multi cluster environment it will allow for patch conformity amongst all participating hosts.
- Vendor Requirements
- Pivotal actually requires multiple clusters
New Workload Domain:
- Patching
- Be able to run patches to test/dev before running them in production
- Multi Tenancy
- Role Based access Control to a separate vCenter
- Isolation
- Full isolation of NSX networks and policies
- Full isolation of vSAN data controlled by a distinct vCenter server and storage policy based management
- Licensing
- Oracle… Would be a great use case to give them their own vCenter to appease their licensing requests.