Manually Modifying the VCF Network Pools

Ever have a host in your VMware Cloud Foundation (VCF) environment die suddenly? Ever had an issue adding a new host back into your VCF environment because you have no available IP addresses left in your Network Pool? Let’s talk about this and I’ll show you some ways to address this!

Network Pools are a construct in VCF that is intended to save you the hassle of manually assigning IP addresses to be used for the various networks (vSAN, NFS, vMotion, ect) every time you add a host to the VCF environment. Like defining a DHCP scope, each network defined in a Network Pool is given a range of IP addresses that the SDDC Manager can pull from as needed.  The SDDC Manager keeps track of what addresses are currently in use and how many addresses are available to be assigned.

When you add a host to VCF, it goes through a process known as commissioning. This process basically gets everything sorted on the host and adds it to the VCF environment. The process to remove a host is known as decommissioning.

There are typically two different situations that can arise when decommissioning a host, based on the ability of the SDDC Manager to communicate to the host. In the case where communication is possible, the SDDC Manager can cleanly remove the host. However, in a case where the host is ‘dead’ or unable to be communicated with, then you typically must force the removal of the host from any Workload Domain it might be a part of and then decommission the host. Once a new replacement host is available, you would commission the host and during the process the SDDC Manager would simply assign any needed IP addresses from the respective Network Pool as needed.

It is a best practice to create the Network Pools with a range of IP addresses that exceed your current requirements. This not only allows for you to scale easily, but also allows you to deal with situations where you have a dead host and need IP addresses for the replacement host.

But what if your network ranges in your Network Pool do not have any free IP addresses?

To put it simply, the commission process will fail, as there aren’t enough IP addresses for the SDDC Manager to assign to the host. Solving this typically just is a matter of adding another range of IP addresses to the Network Pool.

But what if you don’t have the ability to add in any more IP addresses? Perhaps you don’t have any more free IP addresses in the environment or perhaps you have a regulation that you must abide by? Well now you might find yourself with a problem, especially when dealing with a dead host.

Remember that in the dead host scenario, the SDDC Manager doesn’t have the ability to communicate to the failed host. Which means that if that host were ever to come back, it would have the same IP addresses that it did when it failed. If the SDDC Manager were to have reassigned those IP addresses during the time the host was dead, then there exists the possibility that you would have an IP address conflict. This could make for a bad day.

Luckily, the SDDC Manager prevents this. As the SDDC Manager tracks the IP addresses used and available, it simply does not remove the IP addresses that were assigned to the dead host from the list of consumed IP addresses. As a result, the SDDC Manager can not assign those IP addresses to a new host, thus preventing these types of IP address conflicts.

However, if your Network Pool doesn’t have any IP addresses available and your unable to add another range of IP addresses to it, then you will have issues when you try to replace that dead host. This is because even though the dead host is not used, the IP addresses that were assigned to it are not released by the SDDC Manager.  Hence, the host commissioning of your replacement host will fail as there are not any free IP addresses to assign to it.

If this is the case, then one option is to perform a bit of brain surgery on the SDDC Manager. We must go in and erase its memory of the IP addresses it thinks are in use by the dead host and add them back to the pool of available IP addresses.

This is a good time to talk about all the bad things that can happen, should you continue.

For starters, you must be absolutely positive that the IP addresses for the dead host are not in use and the dead host will never be coming back try and use them.  Ever.

Next, to handle this situation, we will be messing with the database maintained by the SDDC Manager. This can cause you serious issues, not only with the functioning of your environment, but may also impact the supportability of it. If you are doing this in a home lab (like one built with the VMware Lab Constructor (VLC)) then that’s one thing. But if you’re even thinking about doing this in any form of a production environment, you seriously need to engage with VMware Customer support before attempting anything here!

If you understand the potential ramifications and wish to continue, please read on…

As I mentioned earlier, the process needed here is pretty simple – we just need to remove the IP addresses that were used by the dead host from the list of used IP address maintained by the SDDC Manager and add it to the list of available IP addresses. You can actually find some hints to what is needed by looking at this KB article (https://kb.vmware.com/s/article/94984). However, this KB article is missing a few key steps, so let’s go through it, step-by-step…

To start, use your favorite terminal program to open two SSH sessions to the SDDC Manager in your environment. We will use one for manipulating the SDDC Manager database and we’ll use the second one to use the API to verify our work. Become root in both sessions.

In the terminal you will use for the API calls, get a token to use for the other API calls using a command like:

TOKEN=`curl -X POST -H "Content-Type: application/json" -d '{"username": "administrator@vsphere.local","password": "VMware123!"}' --insecure https://10.0.0.4/v1/tokens | awk -F "\"" '{ print $4}'`

Of course, you’ll need to replace the information in these example commands so that it matches with your environment!

Now that you have the token, let’s use the API to list all the Network Pools. We can do that with a command like:

curl -k -X GET -H “Content-Type: application/json” -H “Authorization Bearer $TOKEN” https://sddc-manager.vcf.sddc.lab/v1/network-pools | jq

In this example, you’ll see the ID for the only network pool (mgmt-networkpool) that exists within this lab environment. In this pool, you will see the IDs for two different networks.

We can use a command like the one below to get more details about the networks:

curl -k -X GET -H “Content-Type: application/json” -H “Authorization Bearer $TOKEN” https://sddc-manager.vcf.sddc.lab/v1/network-pools/40eb0f17-2797-42b5-9cb5-7b3169142a7d/networks | jq

In this lab environment, you’ll notice that there are several IP addresses listed as being free. For the sake of demonstration, however, I’m going to destroy one of the hosts that has an IP address ending in .104 and pretend that this is the IP address that we need to make available again.

Also note that there are two networks – one for vMotion and one for vSAN.  The vMotion network is 10.0.4.x and the vSAN network is 10.0.8.x.

If you scroll through the output of that command, you will see where the 10.0.4.104 IP is listed in the list of used IP addresses. Although you can’t see it in the screenshot below, the IP address 10.0.8.104 is in the used list for the vSAN network.

So our goal is to pull these IPs out of the respective used IP list and put it back into the unused list.

Use the other terminal window you have open and connect to the SDDC Manager’s database using the following command:

psql -h localhost postgres

From here, you should be able to list all the databases by using the \l command.

We want to connect to the platform database. You can do this by using the command:

\c platfom

Once connected to that database, you can list all the tables using the \dt command.

What we are looking for is the vcf_network table:

Use the following command to show the contents of that table:

select * from vcf_network;

Notice how all the information you saw previously by using the API commands is shown here?

Now we just need to modify the records. First, let’s remove the .104 IP address from the list of used IP address for the vMotion network. We can do this with the following command:

update vcf_network SET used_ip_address=’[“10.0.4.101”,”10.0.4.102”,”10.0.4.103”]’ where id=’065317df-6b89-4c95-9d47-bab78dcc906c’;

Now if we display the contents of the table again, you can see that the 10.0.4.104 IP address is no longer listed in the used_ip_address field for the vMotion network.

But note, just because we removed it from the used_ip_address list, it’s still not added to the free_ip_addresses list. To do that, we’ll have to use another command, such as:

update vcf_network SET free_ip_addresses=’[“10.0.4.104”, “10.0.4.105”, “10.0.4.106”, “10.0.4.107”, “10.0.4.108”, “10.0.4.109”, “10.0.4.110”, “10.0.4.111”, “10.0.4.112”, “10.0.4.113”, “10.0.4.114”, “10.0.4.115”, “10.0.4.116”, “10.0.4.117”, “10.0.4.118”, “10.0.4.119”, “10.0.4.120”]’ where id=’065317df-6b89-4c95-9d47-bab78dcc906c’;

Again, this lab has several IP addresses in the free_ip_addresses field, which I’m not removing. That’s why the command above includes all those other IP addresses.  If you were actually having the issue discussed earlier, you wouldn’t have any IPs listed besides the single IP address (10.0.4.104 in this case) that you would use in the above command.

You’ll want to follow this same process with the other networks in the Network Pool. In this example, that would be the vSAN network, which is on 10.0.8.x. Be careful here, especially if you repeat the commands and double check that you’ve changed the right ID for the network as well as the correct network. It’s far too easy to forget to change something.

At this point, you should go back to the API terminal and get the network details again. If you see an error here (and it’s not because your token expired…), you need to go back and examine your commands. Ensure you didn’t accidentally include a hidden character or substitute a period for a comma.

That’s all you have to do! Now you should be able to commission a new server, using the same IP address(es) that you had used for the server that died!