Many of us have certainly been there in times past. You have a host that is not responding correctly or some other issue that is weirdly affecting a specific ESXi host. You may hit a VMware KB that points you in the direction of restarting the management agents on an ESXi host to resolve a particular issue you are experiencing in your environment. It is the middle of the day, and you are wondering if you should restart the management agents on the ESXi host. Let’s the topic of restart management agents ESXi impact and explore what behavior you will experience in doing this.
Why restart ESXi management agents?
Why again would you want or need to restart the ESXi management agents? There are various issues I have seen in the past that prompted the need to restart the management agents on an ESXi host. There have been times I have seen where a file would be in use for a particular VM or some other “in use” situation that had resources locked. Nine times out of ten, restarting the management agents will resolve these types of issues. Always look at the VMware KBs that pertain to the issue you are experiencing, as restarting the management agents may not get past the issue you are experiencing.
How to restart ESXi management agents
Let’s look at how to do it. How do you restart ESXi management agents? There are a couple of primary ways this can be done, including from the DCUI menus or using the command line from an SSH connection. Below, I am SSH’ed into an ESXi host. Navigate to the Troubleshooting Mode Options > Restart Management Agents.
Confirming the restart of the management agents. There is also a checkbox of sorts that allows you to Collect extra troubleshooting information.
Once you confirm the restart of the management agents, you will see the messages Stopping management agents. Done. and then Starting management agents.
As mentioned, you can also restart the ESXi management agents via the SSH command line. You can do this with the command:
services.sh restart &tail -f /var/log/jumpstart-stdout.log
Restart Management Agents ESXi Impact
The question is, does this adversely impact workloads running on the ESXi host when you restart the management agents? What is the impact of this operation? Let’s take a look at a quick lab of this scenario where we have a VM running and we need to restart the management agents on the ESXi host. What happens when we need to do this?
Let’s run a fun little experiment. I am starting a ping on a virtual machine running on a particular ESXi host in the vSphere cluster. Let’s ensure the connectivity to the VM is not affected when the management agents are restarted.
Here I have started the restart of the management agents. As you can see on the right, the pings are still returning fine.
The ESXi management agents have been stopped and started back again and no loss of connectivity to the virtual machine running on the ESXi host.
One behavior you will see that can make your heart skip a beat, especially in production, is the ESXi host and any subsequent VMs running on that particular host will show as disconnected for a few moments. However, generally after a few seconds, you can manually refresh the vSphere client and the host and VMs will show as healthy again.
While it is safe in many cases to restart your ESXi management agents while running production workloads, it is always best to play it safe and evacuate the host of all workloads before performing maintenance. Also, VMware makes the following notes of caution with restarting ESXi management agents in the VMware KB here:
- If LACP is enabled and configured, do not restart management services using services.sh command. Instead, restart independent services using the /etc/init.d/module restart command.
- If the issue is not resolved, and you are restarting all the services that are a part of the services.sh script, take a downtime before proceeding to the script.
- If NSX is configured in the environment, do not run the /sbin/services.sh restart command because this will restart all services on the ESXi host. If you need to restart the management agents on the ESXi host, restart vpxa, host.d, and fdm individually. If you also need to run the /sbin/services.sh restart command because restarting each management agent does not work, then migrate all the VMs off the ESXi host and put the host in maintenance mode if possible.
- If you are unsure that NSX for vSphere is installed on an ESXi host, run this command to verify:
esxcli software vib list –rebooting-image | grep esx-*Look for the following VIBs to determine if NSX is installed on the ESX host:
- If using shared graphics in a View environment (VGPU, vDGA, vSGA), do not use services.sh. This will shut down the xorg service which is responsible for graphics at the guest OS level. By ripping the graphics out of the guest OS you will in term cause the crash of your VDI workload using the shared graphics. Ensure you are using shared graphics to only restart hostd, and vpxa if you are not in maintenance mode.
Hopefully, this post helps to shed light on the Restart Management Agents ESXi Impact for production workloads running in production. In general, VMs are unaffected by restarting the management agents on your ESXi host. It is always a good idea to play it safe and evacuate a host before performing any kind of maintenance, including restarting management agents. Also, be sure to note the cautions listed from VMware around restarting management agents in specific environment configurations such as those where NSX is installed.