The HA “split brain” problem
This ISN’T something that occurs frequently, but it is something that I would want to test in HA (High Availabilty) before I went to production!
Sometimes when you set the DAS Isolation addresses, it fails to work, ie. when you take vswif0 offline, the virtual machines get shutdown anyway. Given this is emulating an “isolation respone”, where the service console ONLY fails to respond to the network, the virtual machines should only shutdown if the other cards in the server fails to respond.
Ideally, I want my DAS Isolation addresses in a totally seperate subnet from my service console. When I set up another Service Console, it’s always a good idea to remove HA from the cluster, wait for everything to settle down and then re-enable it on the cluster (a couple of minutes work). This makes sure both service consoles are sending heartbeats.
We can view the heartbeats via tcpdump and monitoring port 8044 by using the following commands:
# tcpdump -i vswifX port 8044
# tcpdump –i vswifY port 8044
As a final check, have a look at /var/log/vmware/aam/aam_config_util.def for the isolation addresses. It should look something like:
Start Object esx5
nodefd esx5 {
nodeAddrs = {
{
sourceType = isolation
source.addr =
destination.addr = 172.16.0.36
}
{
sourceType = isolation
source.addr =
destination.addr = 10.0.0.52
}
{
sourceType = domain
source.addr = 172.16.0.105
destination.addr = 224.0.6.127
}
{
sourceType = domain
source.addr = 10.0.0.5
destination.addr = 224.0.6.127
}
}
}
End Object esx5
Also, if das.usedefaultisolationaddress is not set to false the service console’s default gateway appears as an isolation address in the /var/log/vmware/aam/aam_config_util.def file in addition to any ip address configured with das.isolationaddress and das.isolationaddress2.
Pass it on
Tags: high availability, virtual machine Posted in

Tell me what you think