Bug #13985 Cluster: ndb_mgm "status" command can return incorrect data node status
Submitted: 12 Oct 2005 22:01 Modified: 2 Sep 2006 6:04
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.0 OS:Linux (Linux)
Assigned to: Stewart Smith CPU Architecture:Any

[12 Oct 2005 22:01] Jonathan Miller
Description:
In setting up to run a test with a six data node cluster I had started the cluster and was watching it come up. During this time I continued to issue the "all status" command. During the running of this command it showed the data nodes to both be started and starting, the following time the nodes were all back in a starting state.

ndb_mgm> all status
Node 2: starting (Phase 4) (Version 5.0.15)
Node 3: starting (Phase 4) (Version 5.0.15)
Node 4: starting (Phase 4) (Version 5.0.15)
Node 5: starting (Phase 4) (Version 5.0.15)
Node 6: starting (Phase 4) (Version 5.0.15)
Node 7: starting (Phase 5) (Version 5.0.15)

ndb_mgm> all status
Node 2: starting (Phase 4) (Version 5.0.15)
Node 3: starting (Phase 4) (Version 5.0.15)
Node 4: starting (Phase 4) (Version 5.0.15)
Node 5: starting (Phase 4) (Version 5.0.15)
Node 6: starting (Phase 4) (Version 5.0.15)
Node 7: starting (Phase 5) (Version 5.0.15)

ndb_mgm> all status
Node 2: Started (version 5.0.15)
Node 5: Started (version 5.0.15)
Node 6: Started (version 5.0.15)
Node 7: Started (version 5.0.15)
Node 3: Started (version 5.0.15)
Node 4: Started (version 5.0.15)
Node 2: starting (Phase 5) (Version 5.0.15)
Node 3: starting (Phase 5) (Version 5.0.15)
Node 4: starting (Phase 5) (Version 5.0.15)
Node 5: starting (Phase 5) (Version 5.0.15)
Node 6: starting (Phase 5) (Version 5.0.15)
Node 7: starting (Phase 5) (Version 5.0.15)

ndb_mgm> all status
Node 2: starting (Phase 5) (Version 5.0.15)
Node 3: starting (Phase 5) (Version 5.0.15)
Node 4: starting (Phase 5) (Version 5.0.15)
Node 5: starting (Phase 5) (Version 5.0.15)
Node 6: starting (Phase 5) (Version 5.0.15)
Node 7: started (Version 5.0.15)

ndb_mgm> all status
Node 2: started (Version 5.0.15)
Node 3: started (Version 5.0.15)
Node 4: started (Version 5.0.15)
Node 5: started (Version 5.0.15)
Node 6: started (Version 5.0.15)
Node 7: started (Version 5.0.15)

How to repeat:
Using six system, each having a data node and one having both a data node and a ndb_mgmd process, start the cluster and continue to issue the "all status" command

Suggested fix:
Show true data node state
[23 May 2006 8:13] Stewart Smith
About to post patch for a partial fix. It will give you something like:

ndb_mgm> all status
Node 1: started (Version 5.0.21)
Node 2: starting (Phase 5) (Version 5.0.21)

Node 2: Started (version 5.0.21)

(forcibly preventing the interleaving of events with the output of status).

However, after that, if you type really quickly, you could stil get:

ndb_mgm> all status
Node 1: started (Version 5.0.21)
Node 2: starting (Phase 5) (Version 5.0.21)

ndb_mgm> all status
Node 1: started (Version 5.0.21)
Node 2: starting (Phase 5) (Version 5.0.21)

ndb_mgm> all status
Node 1: started (Version 5.0.21)
Node 2: starting (Phase 5) (Version 5.0.21)

ndb_mgm> all status
Node 1: started (Version 5.0.21)
Node 2: started (Version 5.0.21)

To fix this requires a different fix.
IIRC we should force a heartbeat, then send the event. Although would have to look further.
[23 May 2006 8:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/6752
[3 Jul 2006 5:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8629
[7 Jul 2006 7:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8888
[7 Jul 2006 8:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8892
[7 Jul 2006 10:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8897
[9 Aug 2006 7:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/10183

ChangeSet@1.2244, 2006-08-09 15:03:55+08:00, stewart@willster.(none) +6 -0
  BUG#13985
  
  fixups after review by jonas
[9 Aug 2006 8:11] Stewart Smith
Pushed to mysql-5.0-ndb tree. Awaiting merge queue to die down before merging to 5.1
[1 Sep 2006 7:59] Jonas Oreland
pushed to 5.1.12
[1 Sep 2006 19:28] Jonas Oreland
pushed to 5.0.25
[2 Sep 2006 6:04] Jon Stephens
Documented bugfix in 5.0.25 and 5.1.12 changelogs.