Unable to Remove Workload Domain

Sometimes, we get ahead of ourselves… In this case I may have prematurely removed some ESXi hosts from existence! Due to this misjudgment my remove workload domain task is failing, let’s explore how we can move past it.

There I was, running out of memory in my management cluster… my workload domain wasn’t being utilized yet, so I thought it’d be easy enough to repurpose the ESXi hosts within it to resolve my lack of memory in management so I could finish deploying Aria Operations (in this case).
Well, there is an order to things and I should have:

  1. Deleted the Workload Domain in SDDC Manager
  2. Decommissioned the hosts
  3. Reimage / Reinstall the ESXi hosts
  4. Recommission the hosts
  5. Add host to Management Domain

I did not and ended up here:

To get here what I did was:

  1. Reimage / Reinstall the ESXi hosts
  2. Deleted the Workload Domain in SDDC Manager

Opening up the failed task revealed that SDDC Manager was trying to delete some VM’s on one of the now, non-existent hosts.

Logging in to the management vCenter, I could see that the workload domain vCenter was still present. As expected there were plenty of red exclamation points, but no VM’s listed…

A bit more exploration led me to the VM’s tab, and there I found my quarry!

The vSphere Cluster Services VMs, that are hidden from the inventory view, and normally you don’t/cannot interact with them directly. Since I knew the hosts had been obliterated these VM’s no longer existed anywhere except an entry in the vCenter inventory. I thought I’ll select them all and remove from inventory.

Hrm, some solid permissions there.. Next I moved to the hosts tab, selected all the dead hosts and removed from inventory.

After a few ARE YOU SURE screens the hosts were successfully removed, and the VM’s along with them as we’re left with just the failed cluster.

Back to SDDC Manager to see if we caused any additional kerfuffle.. I have to admit, I was a bit nervous as there are lots of remove host operations with NSX and networking in general to unwind everything. None of those subtasks had an issue though and while it took some time (assuming tasks had to timeout as they weren’t able to talk to the hosts) the rest of the removal completed without issue.

Finally, the last task is to decommission the hosts, so that I can reimage/reinstall them, commission and add to my management domain to get that extra memory! This process was quite easy, I made sure to leave the “Skip failed hosts during decommissioning” switch On.

I guess the moral of the story here is to ensure you do things in the correct order! Back to working with Aria!