Description:
crash test using killVM method. KillVM Simulates a power cable unplug.
Steps:
I have 6 groups and 3 shards (global group is my_group4)
I killed a VM where master of my_group1 and My_group4 are present.
I restarted the VM and all servers.
Then I checked the lookup_servers and did not see new master.
my_group4 lookup_servers:
Command :
{ success = True
return = [{'status': 'SECONDARY', 'server_uuid': '2830361c-a926-11e3-91da-0021f6fab222', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'kven07:18016'}, {'status': 'SECONDARY', 'server_uuid': '29664d9b-a926-11e3-91da-0021f6fab223', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'kven08:18017'}, {'status': 'FAULTY', 'server_uuid': '2af70228-a926-11e3-91da-0021f6fab221', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'kven06:18015'}, {'status': 'SECONDARY', 'server_uuid': '2b0a43a0-a926-11e3-91da-0021f6fab224', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'kven09:18018'}]
activities =
}
Where my_group4 activate ran fine
Procedure :
{ uuid = 0b742ac9-6dac-44d9-ae75-9659d2ff94e1,
finished = True,
success = True,
return = True,
activities =
}
I removed the faulty server from my_group1 and added it again. When I did a promote I got below error.
my_group1 lookup_servers
Command :
{ success = True
return = [{'status': 'SECONDARY', 'server_uuid': '274da6ba-a926-11e3-91da-0021f6fab222', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'kven07:18004'}, {'status': 'SECONDARY', 'server_uuid': '2936cfb4-a926-11e3-91da-0021f6fab223', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'kven08:18005'}, {'status': 'SECONDARY', 'server_uuid': '2a6c28e8-a926-11e3-91da-0021f6fab221', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'kven06:18003'}, {'status': 'SECONDARY', 'server_uuid': '2a8270bf-a926-11e3-91da-0021f6fab224', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'kven09:18006'}]
activities =
}
promote:
Procedure :
{ uuid = 0ca4727f-11a0-411c-948f-c3a8c8bf0248,
finished = True,
success = False,
return = GroupError: Group master not running my_group4,
activities =
}
failover solution does not handle well the case that both the global and
a shard group have a failed master
How to repeat:
see above