Bug #49843 ndb_mgm_get_status() returning invalid node_group information for starting node
Submitted: 21 Dec 2009 9:59 Modified: 9 Jan 2015 14:42
Reporter: Hartmut Holzgraefe Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Cluster: NDB API Severity:S3 (Non-critical)
Version:mysql-cluster-7.0.6 OS:Linux
Assigned to: Magnus BlÄudd CPU Architecture:Any

[21 Dec 2009 9:59] Hartmut Holzgraefe
Description:
ndb_mgm_get_status() returning invalid node_group information for starting node

here tested with a node of node group 1 in a four data node cluster:

first the node is up and running:

Node Id 4 Group 1 (0x1 ) Start phase 0 State: The node is running

now i stop it with "4 STOP":

Node Id 4 Group 1 (0x1 ) Start phase 1 State: The node is shutting down
Node Id 4 Group 1 (0x1 ) Start phase 2 State: The node is shutting down
Node Id 4 Group 1 (0x1 ) Start phase 3 State: The node is shutting down
Node Id 4 Group 1 (0x1 ) Start phase 4 State: The node is shutting down
Node Id 4 Group 0 (0x0 ) Start phase 0 State: The node cannot be contacted

So during the full shutdown procedure the group id is still shown correctly, only when
the node is fully gone it changes to 0

Now i start the node with "ndbd --nostart"

Node Id 4 Group 1 (0x1 ) Start phase 0 State: The node's status is not known
Node Id 4 Group -1 (0xFFFFFFFF ) Start phase 0 State: The node has not yet executed the startup protocol

Funny enough the node seems to show the correct node group briefly
before it changes to -1

Now i issue "4 START":

Node Id 4 Group -202116109 (0xF3F3F3F3 ) Start phase 0 State: The node is executing the startup protocol
Node Id 4 Group -202116109 (0xF3F3F3F3 ) Start phase 2 State: The node is executing the startup protocol
Node Id 4 Group -202116109 (0xF3F3F3F3 ) Start phase 4 State: The node is executing the startup protocol
Node Id 4 Group 1 (0x1 ) Start phase 100 State: The node is executing the startup protocol
Node Id 4 Group 1 (0x1 ) Start phase 0 State: The node is running

So first the group changes from -1 to a much larger negative value
before it changes to the actual group id somewhere between start
phases 4 and 100. As i was running this on an empty test installation
my mgmapi program was not fast enough to capture all start phases,
i'll give it another try on a larger installation later to check
the exact phase when the transition happens.

How to repeat:
check ndb_mgm_get_status() results for node_group on a starting node,
e.g. using the attached simple mgmapi monitoring app

Suggested fix:
Either consistently return -1 for node_group while a node is not active part of any node group or document node_group as only being defined for fully started nodes (node_status being either NDB_MGM_NODE_STATUS_STARTED or
NDB_MGM_NODE_STATUS_SINGLEUSER)