VMware vSAN Witness Host Not Found

0

In continuing to work with the 2-node VMware vSAN stretched cluster in the home lab, I ran into a little issue with the network showing as partitioned and the witness host showing it was in STANDALONE mode when I could ping between the witness appliance and both hosts as well as ping using the vmkernel interfaces from the vSAN hosts and back to the witness appliance. As configured in a recent post, I had configured the Witness Appliance inside of VMware Workstation to POC running the witness outside of my vSAN cluster in the home lab. As it turns out I had ran into not a connectivity issue, but rather an MTU packet size issue as I will explain. If you run into the issue of the VMware vSAN Witness host not found.

VMware vSAN Witness Host Not Found

In the issue I experienced, it appeared like the network connectivity between the witness node and the other two hosts was good or so I thought.  However, I had errors in the health of the vSAN cluster.  As you can see below, the error presented, Witness host not found under the Stretched cluster section.

VMware-vSAN-Witness-Host-Not-Found-Error VMware vSAN Witness Host Not Found

VMware vSAN Witness Host Not Found Error

More detail to some degree on the error if we click on the “witness host not found” message.  We see the Found 0 witness hosts on stretched cluster.  The number of witness host on stretched cluster is not 1.

Found-0-witness-hosts-on-stretched-cluster-error VMware vSAN Witness Host Not Found

Found 0 witness hosts on stretched cluster error

A helpful command that verifies the state of the Witness host is the esxcli vsan cluster get command.  Note below, the Local Node State: STANDALONE which indicates the witness host is isolated or in a “network partition”.  This means it can’t properly see the other vSAN hosts.

Running-the-esxcli-vsan-cluster-get-command VMware vSAN Witness Host Not Found

Running the esxcli vsan cluster get command

There were a few helpful VMware KB articles that helped to point me in the right direction of what was going on:

vSAN Health Service – Witness host not found (2130585)
vSAN Health Service – Network Health – vSAN Cluster Partition (2108011)
vSAN Health Service – Network Health – Hosts small ping test (connectivity check) and Hosts large ping test (MTU check) (2108285)

VMware Workstation Limitation for VMware vSAN Witness Host

As cool as it is to be able to use VMware Workstation for hosting the VMware vSAN Witness Host, it does have a limitation.  In putting the errors together, the inability for the stretched cluster to see the Witness host and the “large ping” test fail, I looked at how the Witness host portgroup was configured.  I had set the portgroup to jumbo frames, however, in looking the VMware Workstation NIC I was using had not been set to jumbo frames so wasn’t able to communicate.

VMware-vSAN-Witness-Appliance-port-group-configuration VMware vSAN Witness Host Not Found

VMware vSAN Witness Appliance port group configuration

NIC-settings-MTU-set-to-jumbo-frames VMware vSAN Witness Host Not Found

NIC settings MTU set to jumbo frames

The problem is as far as I have found, VMware Workstation doesn’t support jumbo frames especially outside of the vSwitch, i.e. for bridged traffic out to the LAN.  The resolution for me to get past the network partition on the server was to set the MTU value back to 1500 for the vSAN Witness portgroup.  This resolved the partition issue and vSAN Witness host not found error, however, I am still left with the large ping test failing.  This seems to be more of a soft error however as the cluster is now up and running and able to talk to the vSAN Witness host.

Thoughts

If you run into a network partition issue, be sure your port group/physical nic, switchport all match up the MTU value you are passing along.  If you set the MTU to jumbo frames at the portgroup level and don’t have this configured on the physical layer, you will have issues.  In my case the limitation of VMware Workstation is evident here as I am not able to pass along jumbo frame sizes bridged to the LAN from the VMware Workstation vSwitch.  By configuring the portgroup back to MTU value of 1500, the VMware vSAN Witness Host is found and the cluster is no longer partitioned.