Greetings, faithful reader,
This one is as much a reminder for me as it is one to you!
When you’re entering Maintenance Mode, the Vmware ESX server will always try to Vmotion all of it’s hosted virtual machines away. Problems can occur when hosts fail to migrate when entering Maintenace Mode. The server will stop at two percent and then time out. Regrettably, there is no notification that an individual Virtual Machine’s Vmotion event fails.
However, there is another cause as well: if your cluster is both HA (High Availability) and DRS (Distributed Resource Scheduling) enabled, when you put an ESX server into Maintenance mode, DRS will generate a five star recommendation. Manual mode, however, requires user interaction and you have to tell the ESX cluster to initiate the Vmotion events. Once I accept the recommendations, Vmotion will start and the target ESX server will be placed in Maintenance Mode.
There is one other case where it will fail. This one falls into the category of “undocumented system feature”. The summary is:
From: http://kb.vmware.com/selfservice/viewContent.do?externalId=1007156&sliceId=1
- An ESX host fails to enter maintenance mode in a VMware High Availability (HA) or DRS cluster
- Hosts fail to migrate when attempting to enter maintenance mode
- The progress indicator remains at 2% indefinitely
- Trying to remediate a host and getting a time out error when trying to enter the maintenance mode
Cause:
This is normal behavior for a VMware HA/DRS cluster that is using strict admission control.Disabling strict admission control (allowing virtual machines to power on even if they violate constraints) should allow a host to enter maintenance mode in this situation but a bug was discovered whereby it did not.
Resolution:For a permanent solution, upgrade to VirtualCenter 2.5 Update 3.
To workaround the issue, temporarily disable VMware HA in the cluster settings. You will then be able to put the ESX Server host into Maintence mode and do the work required. You can then re-enable HA on your cluster.