Minimalistic VCF 4.0 Deployments with Kubernetes

No doubt, you’ve heard about the recent release of VMware Cloud Foundation 4.0 and its support for Kubernetes (K8s).  Although you would like to play with the product, the resource requirements seem greater than the resources you have available. Today, I’m going to talk about how you can get a VCF 4.0 environment up and running with the least amount of resources…

Before beginning, it’s important to emphasize that much of what I’ll be talking about is unsupported and should not be performed in a production deployment. This is intended to help people who want to play with VCF in a lab environment with limited resources. With that disclaimer out of the way, let’s get down to business…

Let’s talk about the high-level design. Given that we are talking about deploying in a lab environment, odds are good that you are willing to trade a bit of performance for functionality. It’s also safe to assume that you have limited hardware available to you, but you want to get the most functionality you can from the resources you have.

Given this, deploying VCF into a nested environment will provide the most functionality with the least amount of physical resources. To help you with this, you can leverage a tool called VMware Lab Constructor (VLC). This tool provides a ‘Automated’ mode of deployment, specifically for people who want to get a VCF environment up and running as fast as possible. In this mode, VLC will automate the creation of sandboxed environment that includes essential services like DNS, NTP, and routing. I will use VLC 4.0 as the basis for the rest of this article.

VMware Lab Constructor includes some sample configuration files that you can use in order to deploy your environment. These files (and VLC itself), have been modified to produce an environment that favors resource consumption over factors like availability or performance.

There are two such configuration files included, the main difference between them being if you desire to deploy Application Virtual Networks (AVNs) at bringup or not.  In VCF 4.0, the deployment of an AVN at bringup has been made optional. If you desire just to play with Kubernetes, then you can choose not to implement this. This saves you from having to deploy multiple edge clusters (as you would need one for AVN and another to support K8s), which can consume a bit of resources.

How VCF is deployed will also impact the amount of resources required. VCF can be deployed in two different methods: Standard and Consolidated.

In a Standard deployment model, three or more hosts are combined to make a workload domain. This is in addition to the hosts that support the management domain. Each domain will also require a deployed vCenter Server and associated infrastructure.

In a Consolidated deployment model, everything is put into the management domain. Since there is already a vCenter server deployed, that means that we can save a bit of resources here.

Our high-level plan then is to use VLC in the Automated mode to deploy VCF in a consolidated method without AVNs at bringup. We will then deploy an edge cluster in the management domain and then leverage that to deploy K8s.

Just one small problem…

VCF 4.0 didn’t provide support for the use of the consolidated deployment model in conjunction with Kubernetes when it GA’d. Late last week, it was announced that this is a supported configuration. However, if you install VCF 4.0 now you will notice that the management domain is not selectable if you try and use it for K8s.

This will be addressed in a future version of VCF, but for now, we will have to use the workaround explained in the VMware whitepaper entitled VMware vSphere with Kubernetes Support on the VMware Cloud Foundation Management Domain.

To kick things off, first we need to get VCF deployed. There are several paths you could take to do this, but a few considerations:

  • Use the VLC configuration file that specifically excludes the deployment of AVNs at bringup. In the latest VLC 4.0 bundle, that file is called ‘AUTOMATED_NO_AVN_VLAN_10-13_NOLIC_v4.json’.
  • Edit the configuration file before you begin to add the license keys you intend to use. In particular, ensure you are using a vSphere with Kubernetes license key. VLC will assign this key to all the hosts that will be deployed for the management domain as it is building the environment. If you forget to do this, you can always go into the deployed vCenter Server instance for the management domain after the deployment finishes and add it in manually.
  • There is a file called ‘additional_DNS_Entries.txt’ that you can use to have VLC configure the DNS with any entries you need when it is getting everything configured. You’ll need two DNS entries added. I’ll talk about how to do this manually later in this article, but you can save yourself some time if you add them to this file before you tell VLC to do the deployment.
  • Select the option to add additional hosts at the VLC screen. VLC provides a json file to add three additional large hosts, called ‘add_3_BIG_hosts.json’. Selecting this during the intial deployment will just save you some time. Of course, if you forget to do this at bringup, you can always use the Expansion Pack option of VLC to add in the hosts.
  • Once you get VCF deployed, use the included json file ‘add_3_BIG_hosts_buld_commission VSAN.json’ to perform a bulk commissioning of the additional hosts you added. Again, you don’t have to do a bulk commissioning, but it’s a lot easier than doing it one at a time.

Before you can continue, you need to make sure that you have a valid vSphere with Kubernetes license key entered within the SDDC Manager. When you expand the cluster in the management domain, you will need to specify a license key to use from the list of license keys the SDDC Manager knows of. If you do not have a selectable key available, you’ll have to quit the wizard and enter one. Best to just do it now…

While you’re at it, if you forgot to use a vSphere with Kubernetes license key when you had VLC do bringup, you should take this time and manually assign a valid key to all of the hosts within the management domain.

You are now ready to expand the management domain by adding the newly commissioned hosts to the cluster within the management domain.

The next major task is to deploy an edge cluster. As always, there are several different methods to do this. Because I’m lazy, I’m going to use the script-based method that I described here. However, I’m going to use a modified json file for this. This json will match what VLC deploys and saves us some typing:

{
        "edgeClusterName": "Mgmt-edge-cluster",
        "edgeClusterType": "NSX-T",
        "edgeRootPassword": "VMware123!VMware123!",
        "edgeAdminPassword": "VMware123!VMware123!",
        "edgeAuditPassword": "VMware123!VMware123!",
        "edgeFormFactor": "LARGE",
        "tier0ServicesHighAvailability": "ACTIVE_ACTIVE",
        "mtu": 8940,
        "asn": 65003,
        "tier0RoutingType": "EBGP",
        "tier0Name": "mgmt-T0",
        "tier1Name": "mgmt-T1",
        "edgeClusterProfileType": "DEFAULT",
        "edgeNodeSpecs": [{
                        "edgeNodeName": "edge1-mgmt.vcf.sddc.lab",
                        "managementIP": "10.0.0.60/24",
                        "managementGateway": "10.0.0.221",
                        "edgeTepGateway": "172.27.13.253",
                        "edgeTep1IP": "172.27.13.2/24",
                        "edgeTep2IP": "172.27.13.3/24",
                        "edgeTepVlan": 13,
                        "clusterId": "CLUSTERID",
                        "interRackCluster": "false",
                        "uplinkNetwork": [{
                                        "uplinkVlan": 11,
                                        "uplinkInterfaceIP": "172.27.11.2/24",
                                        "peerIP": "172.27.11.253/24",
                                        "asnPeer": 65001,
                                        "bgpPeerPassword": "VMware1!"
                                },
                                {
                                        "uplinkVlan": 12,
                                        "uplinkInterfaceIP": "172.27.12.2/24",
                                        "peerIP": "172.27.12.253/24",
                                        "asnPeer": 65001,
                                        "bgpPeerPassword": "VMware1!"
                                }
                        ]
                },
                {
                        "edgeNodeName": "edge2-mgmt.vcf.sddc.lab",
                        "managementIP": "10.0.0.61/24",
                        "managementGateway": "10.0.0.221",
                        "edgeTepGateway": "172.27.13.1",
                        "edgeTep1IP": "172.27.13.4/24",
                        "edgeTep2IP": "172.27.13.5/24",
                        "edgeTepVlan": 13,
                        "clusterId": "CLUSTERID",
                        "interRackCluster": "false",
                        "uplinkNetwork": [{
                                        "uplinkVlan": 11,
                                        "uplinkInterfaceIP": "172.27.11.3/24",
                                        "peerIP": "172.27.11.253/24",
                                        "asnPeer": 65001,
                                        "bgpPeerPassword": "VMware1!"
                                },
                                {
                                        "uplinkVlan": 12,
                                        "uplinkInterfaceIP": "172.27.12.3/24",
                                        "peerIP": "172.27.12.253/24",
                                        "asnPeer": 65001,
                                        "bgpPeerPassword": "VMware1!"
                                }
                        ]
                }
        ]
}
 

Before I can use this script however, we need to ensure that the appropriate DNS entries available for the two NSX edges that will get deployed. Since we used VLC in the Automated mode, VLC went ahead and configured the Cloud Builder appliance to provide DNS, NTP, and routing services. If you look at the json contents provided above, you’ll see two edges are defined:

  • edge1-mgmt.vcf.sddc.lab
    • 10.0.0.60
  • edge2-mgmt.vcf.sddc.lab
    • 10.0.0.61

We need to ensure these are able to be resolved by the SDDC Manager. These names are not included in the DNS entries by default. Earlier, I mentioned how you could have added these to the additional_DNS_Entries.txt file included with VLC to have VLC add the entries automatically for you. If you didn’t do this before, then adding the needed entries in manually is pretty easy:

  • SSH into the Cloud Builder appliance
  • Edit the /etc/maradns/db.vcf.sddc.lab file and add the two lines as needed
  • After this, restart the maradns service
  • Test and ensure you can resolve both the names, forward and reverse

Now we are ready to run the script to make the edge cluster. Of course, you can also do this through the SDDC Manager and I’d invite you to do this at least once. Just use the same values and you should have no issues.

The script I’ll use is a slight variation to the script I showed previously. The key difference is that there is small change on the line that defines the clus_id variable where a 1 is changed to a 0. This means that it will pick up the cluster ID for the management domain instead of a workload domain that is created in addition to the management domain.

#!/bin/bash

echo -e "Starting magic...\n\n"
token=`curl -X POST -H "Content-Type: application/json" -d '{"username": "administrator@vsphere.local","password": "VMware123!"}' --insecure https://10.0.0.4/v1/tokens | awk -F "\"" '{ print $4}'`

#echo -e "Token:\n$token\n"

# Use the following line for standard deployments
#clus_id=`curl -X GET --insecure -H 'Content-Type: application/json' -H "Authorization: Bearer $token" 'https://localhost/v1/clusters' -k | jq '.elements | .[1].id'|tr -d '"'`
# Use the following line for consolidated deployments
clus_id=`curl -X GET --insecure -H 'Content-Type: application/json' -H "Authorization: Bearer $token" 'https://localhost/v1/clusters' -k | jq '.elements | .[0].id'|tr -d '"'`

#echo -e "Cluster ID: $clus_id\n"

sed "s/CLUSTERID/$clus_id/g" edge_cluster.json > modified_edge_cluster.json

echo -e "\nDoing Validation...\n"
validate_id=`curl --insecure -H 'Content-Type: application/json' -H "Authorization: Bearer $token" 'https://localhost/v1/edge-clusters/validations' -d @modified_edge_cluster.json -k | jq '.id'|tr -d '"'`

#echo -e "\nValidation ID: $validate_id\n"

echo -e "\nPlease wait. Validation takes a minute.\n"
sleep 60
status=`curl -X GET --insecure -H 'Content-Type: application/json' -H "Authorization: Bearer $token" "https://localhost/v1/edge-clusters/validations/$validate_id" -k | jq '.resultStatus'|tr -d '"'`

#echo -e "Status: $status\n\n"

if [ $status == 'SUCCEEDED' ]
then
  echo -e "\nShake 'n bake!\n"
curl -X POST --insecure -H 'Content-Type: application/json' -H "Authorization: Bearer $token" 'https://localhost/v1/edge-clusters' -d @modified_edge_cluster.json -k | json_pp
else
  echo -e "\nFailure detected. Go check things and try again.\n"
fi
 
echo -e "\nAll done.\n\n"

Make sure that you save the json spec listed above as ‘edge_cluster.json’. Run the script and you should see output something similar to what I show here.

After you do this, you will notice that a task to create the edge cluster has started on the SDDC Manager. Wait for the process to complete.

Your almost done.

Now, if you tried to enable Workload Control Plane (WCP) now, you’d find that you would still have an issue as you wouldn’t be able to select the management domain. To work around this, we need to login to the NSX Manager (‘admin’ and ‘VMware123!VMware123!’ should be the username and password if you used the defaults provided with VLC).

Once logged into the NSX Manager, navigate to System->Fabric->Compute Managers and select the vCenter for the management domain. Notice where it says that trust is not enabled? We need to enable that. Edit the configuration and simply enable the trust.

Log out and then log back into the NSX Manager.

Navigate to System->Fabric->Nodes->Edge Clusters and click on the edge cluster named mgmt-edge-cluster. What you want to see is that there are two tags present, as shown below. If one of these is missing, add it.

Now that you have this done, you can go ahead and enable the Workload Management feature. One caveat though…

Remember how I said that enabling k8s in the management domain wasn’t supported at GA of VCF 4.0? Due to this, the SDDC Manager UI will filter out the management domain when you try and use the wizard there to help you avoid selecting an unsupported configuration. So even after adding the trust and the tags mentioned above, you still will not be able to use the SDDC Manager to enable WCP.

You can, however, now go to the vCenter Server deployed for the management domain and enable WCP from there. With the tags and the trust set, you will now be able to see the management cluster and follow the rest of the wizard to enable WCP.

In closing, just a few tidbits:

  • With VLC and the provided configurations, VLAN 11 and 12 are for your uplinks. VLAN 13 will be for your edge TEPs.
  • Yes, you need the VLANs.
  • You should be able to ping the IPs on the edges that you create. If you are not able to, fix that problem first!
  • You will find that the routes are not being advertised through BGP once you enable WCP. This is due to a bug with VCF 4.0. Refer to the VMware Whitepaper here for steps on how to enable the route advertisements.
  • If you decide to deploy VCF using the standard architecture, realize that this will consume more resources. VLC automatically adds some settings to the SDDC Manager in order to facilitate the creation of additional workload domains with the least amount of resources possible. If you haven’t used VLC, simply add the following to /opt/vmware/vcf/domainmanager/config/application-prod.properties:
    • nsxt.manager.formfactor=medium
    • nsxt.management.resources.validation.skip=true
    • vc7.deployment.option:tiny
    • nsxt.manager.cluster.size=1
  • After doing this, make sure you restart the domain manager service by executing the following command as the root user:
    • # systemctl restart domainmanager

Enjoy!