In working recently in a nested VMware environment running three clusters and NSX, I ran into a jumbo frames issue with VXLAN connectivity across the clusters. I was simulating different WAN/subnet environments using an Untangle appliance as a router in the “remote” cluster site. It was interesting however, I was able to narrow down the problem was with the Untangle virtual router appliance configuration. Let’s take a look at troubleshooting NSX VXLAN jumbo frames in nested lab and the issue I ran into.
Troubleshooting NSX VXLAN jumbo frames in nested lab
A little bit of troubleshooting background I ran before diving into the Untangle appliance. Before looking into further testing, I made sure that at least at a vCenter level, everything was happy in the “installation” page which shows current status of the manager as well as controller VMs.
Moving onto actually pinging the vmkernel nics that are setup for VXLAN traffic. The process to check via ping the response between VXLAN vmnics can be achieved using the following command:
ping ++netstack=vxlan -d -s 1572 -I vmk1 <vmkernel IP address>
As you can see above, pinging the VXLAN vmkernel IP address with a jumbo frame yields no ping response. However, in using a regular sized packet, we receive a ping response as you can see below.
So, we know there is a problem with jumbo frames somewhere in the chain. I have been using Untangle virtual routers in lab environments lately, as they do everything I need them to do and they are basically point and click to setup so it makes it easy for a lab environment. However, after checking MTUs on vSwitches, etc, I started to suspect that something was going on with the router not passing jumbo frames through even though it was set to pass them as you can see below.
One thing I noticed with the Untangle virtual appliances that you can deploy with the OVA downloaded from Untangle is the network adapter type is set to Flexible. I found a post of a similar nature concerning jumbo frames not being passed in certain NIC configurations. What is ironic though the fix for me was to change from Flexible to E1000. The E1000e wasn’t an option for me even after upgrading the machine hardware to version 11 most likely do to the operating system type. The author in the link above noted that he had issues with the E1000 as opposed to the E1000e. So far for me though it has worked.
As you can see below, pings to the vmkernel nic with the large packet can return successfully.
Your mileage may vary depending on how you are setting up your lab with NSX. However, in troubleshooting NSX VXLAN jumbo frames in nested lab for me turned out to be the type of network adapter that is being used on the Untangle appliance.