I’m seeing a lot of posts on different forums about VMware HA.
Some people, including myself, have been receiving error messages when trying to power on a vm in a HA cluster like this one:
“Insufficient resources to satisfy HA failover level on cluster x in y”
Some fellow bloggers have preceeded me in providing clear explanations on how HA actually calculates this sort of thing. For instance:
Now, I still see a lot of replies from people that are having difficulties calculating exactly what happens in their situation. For those people, I have created a Powershell / VI Toolkit (beta) script that does the math and shows you exactly what is going on in your cluster.
So, download it, try it, and please send me some feedback if you have questions, remarks or problems.
Hope this helps.
“When you add a host to a cluster, all virtual machines in the cluster default to the cluster’s default restart priority (Medium, if unspecified) and default isolation response (Power off, if unspecified).”
Resource Management Guide, page 116.
So avoid using per-vm settings for HA, because those are lost when adding a host to your cluster.
“As a result, if the network connection is restored in this window between 12 and 14 seconds after the host has lost connectivity, the virtual machines are powered off but not failed over.”
Resource Management Guide, page 80.
So it is theoretically possible that, in the case of an intermittent network problem, some of your virtual machines will be powered off but not restarted (not on the original server nor on any other server).
“In a cluster using DRS and HA with HA admission control turned on, virtual machines might not be evacuated from hosts entering maintenance mode. This is because of the resources reserved to maintain the failover level. In this case, you must manually migrate the virtual machines off the hosts using VMotion.”
Resource Management Guide, page 80
What will happen, is that the host will not enter maintenance mode until you have manually powered off or VMotioned the vm’s. Note that this will probably be an issue when trying to schedule ESX updates, as Update Manager will try to enter each ESX server into maintenance mode before installing the updates.