Bug #19039 multi node stop causes node failure handling not to complete
Submitted: 12 Apr 2006 10:12 Modified: 28 Apr 2006 9:07
Reporter: Tomas Ulin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:4.1-> OS:
Assigned to: CPU Architecture:Any

[12 Apr 2006 10:12] Tomas Ulin
Description:
nodes 3,4,5,6

5 is master, and is killed
3 (newly elected master, and not completed takeover), is killed

resulting in "node failure handling not completed..." of node 5

How to repeat:
2006-04-06 14:53:49 [MgmSrvr] INFO     -- Node 5: Local checkpoint 37 started. Keep GCI = 80121 oldest restorable GCI = 80422
2006-04-06 15:36:04 [MgmSrvr] ALERT    -- Node 3: Node 5 Disconnected
2006-04-06 15:36:04 [MgmSrvr] INFO     -- Node 3: Communication to Node 5 closed
2006-04-06 15:36:04 [MgmSrvr] INFO     -- Node 1: Node 5 Connected
2006-04-06 15:36:04 [MgmSrvr] ALERT    -- Node 3: Arbitration check won - node group majority
2006-04-06 15:36:04 [MgmSrvr] INFO     -- Node 3: President restarts arbitration thread [state=6]
2006-04-06 15:36:04 [MgmSrvr] INFO     -- Node 3: GCP Take over started

....

2006-04-06 15:36:04 [MgmSrvr] INFO     -- Node 5: Node shutdown completed.
2006-04-06 15:36:05 [MgmSrvr] INFO     -- Node 1: Node 3 Connected
2006-04-06 15:36:05 [MgmSrvr] ALERT    -- Node 4: Node 3 Disconnected
2006-04-06 15:36:05 [MgmSrvr] INFO     -- Node 4: Communication to Node 3 closed
2006-04-06 15:36:05 [MgmSrvr] ALERT    -- Node 4: Network partitioning - arbitration required
2006-04-06 15:36:05 [MgmSrvr] INFO     -- Node 4: President restarts arbitration thread [state=7]
2006-04-06 15:36:05 [MgmSrvr] ALERT    -- Node 6: Node 3 Disconnected
2006-04-06 15:36:05 [MgmSrvr] INFO     -- Node 6: Communication to Node 3 closed
2006-04-06 15:36:05 [MgmSrvr] ALERT    -- Node 4: Arbitration won - positive reply from node 1
2006-04-06 15:36:05 [MgmSrvr] INFO     -- Node 4: GCP Take over started

....

2006-04-06 15:37:00 [MgmSrvr] INFO     -- Node 4: Communication to Node 2 opened
2006-04-06 15:37:00 [MgmSrvr] INFO     -- Node 6: Communication to Node 2 opened
2006-04-06 15:37:03 [MgmSrvr] WARNING  -- Node 4: Failure handling of node 5 has not completed in 1 min. - state = 3
2006-04-06 15:37:03 [MgmSrvr] WARNING  -- Node 6: Failure handling of node 5 has not completed in 1 min. - state = 3
2006-04-06 15:38:03 [MgmSrvr] WARNING  -- Node 4: Failure handling of node 5 has not completed in 2 min. - state = 3
2006-04-06 15:38:04 [MgmSrvr] WARNING  -- Node 6: Failure handling of node 5 has not completed in 2 min. - state = 3
[26 Apr 2006 12:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/5565
[26 Apr 2006 12:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/5566
[27 Apr 2006 5:29] Tomas Ulin
changed bug report to only adress the issue that this may happen even if nodes are shutdown using the management server

a fix for this has been pushed to 5.0.22 and 5.1.10
[27 Apr 2006 5:30] Tomas Ulin
patch reviewed by Jonas
[28 Apr 2006 9:07] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented bugfix in 5.0.22/5.1.10 changelogs; closed.