Bug #19039 multi node stop causes node failure handling not to complete
Submitted: 12 Apr 2006 12:12 Modified: 28 Apr 2006 11:07
Reporter: Tomas Ulin
Status: Closed
Category:Server: Cluster Severity:S2 (Serious)
Version:4.1-> OS:
Assigned to: Target Version:

[12 Apr 2006 12:12] Tomas Ulin
Description:
nodes 3,4,5,6

5 is master, and is killed
3 (newly elected master, and not completed takeover), is killed

resulting in "node failure handling not completed..." of node 5

How to repeat:
2006-04-06 14:53:49 [MgmSrvr] INFO     -- Node 5: Local checkpoint 37 started. Keep GCI =
80121 oldest restorable GCI = 80422
2006-04-06 15:36:04 [MgmSrvr] ALERT    -- Node 3: Node 5 Disconnected
2006-04-06 15:36:04 [MgmSrvr] INFO     -- Node 3: Communication to Node 5 closed
2006-04-06 15:36:04 [MgmSrvr] INFO     -- Node 1: Node 5 Connected
2006-04-06 15:36:04 [MgmSrvr] ALERT    -- Node 3: Arbitration check won - node group
majority
2006-04-06 15:36:04 [MgmSrvr] INFO     -- Node 3: President restarts arbitration thread
[state=6]
2006-04-06 15:36:04 [MgmSrvr] INFO     -- Node 3: GCP Take over started

....

2006-04-06 15:36:04 [MgmSrvr] INFO     -- Node 5: Node shutdown completed.
2006-04-06 15:36:05 [MgmSrvr] INFO     -- Node 1: Node 3 Connected
2006-04-06 15:36:05 [MgmSrvr] ALERT    -- Node 4: Node 3 Disconnected
2006-04-06 15:36:05 [MgmSrvr] INFO     -- Node 4: Communication to Node 3 closed
2006-04-06 15:36:05 [MgmSrvr] ALERT    -- Node 4: Network partitioning - arbitration
required
2006-04-06 15:36:05 [MgmSrvr] INFO     -- Node 4: President restarts arbitration thread
[state=7]
2006-04-06 15:36:05 [MgmSrvr] ALERT    -- Node 6: Node 3 Disconnected
2006-04-06 15:36:05 [MgmSrvr] INFO     -- Node 6: Communication to Node 3 closed
2006-04-06 15:36:05 [MgmSrvr] ALERT    -- Node 4: Arbitration won - positive reply from
node 1
2006-04-06 15:36:05 [MgmSrvr] INFO     -- Node 4: GCP Take over started

....

2006-04-06 15:37:00 [MgmSrvr] INFO     -- Node 4: Communication to Node 2 opened
2006-04-06 15:37:00 [MgmSrvr] INFO     -- Node 6: Communication to Node 2 opened
2006-04-06 15:37:03 [MgmSrvr] WARNING  -- Node 4: Failure handling of node 5 has not
completed in 1 min. - state = 3
2006-04-06 15:37:03 [MgmSrvr] WARNING  -- Node 6: Failure handling of node 5 has not
completed in 1 min. - state = 3
2006-04-06 15:38:03 [MgmSrvr] WARNING  -- Node 4: Failure handling of node 5 has not
completed in 2 min. - state = 3
2006-04-06 15:38:04 [MgmSrvr] WARNING  -- Node 6: Failure handling of node 5 has not
completed in 2 min. - state = 3
[26 Apr 2006 14:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/5565
[26 Apr 2006 14:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/5566
[27 Apr 2006 7:29] Tomas Ulin
changed bug report to only adress the issue that this may happen even if nodes are
shutdown using the management server

a fix for this has been pushed to 5.0.22 and 5.1.10
[27 Apr 2006 7:30] Tomas Ulin
patch reviewed by Jonas
[28 Apr 2006 11:07] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented bugfix in 5.0.22/5.1.10 changelogs; closed.