Description:
ndb_mgm_get_status() returning invalid node_group information for starting node
here tested with a node of node group 1 in a four data node cluster:
first the node is up and running:
Node Id 4 Group 1 (0x1 ) Start phase 0 State: The node is running
now i stop it with "4 STOP":
Node Id 4 Group 1 (0x1 ) Start phase 1 State: The node is shutting down
Node Id 4 Group 1 (0x1 ) Start phase 2 State: The node is shutting down
Node Id 4 Group 1 (0x1 ) Start phase 3 State: The node is shutting down
Node Id 4 Group 1 (0x1 ) Start phase 4 State: The node is shutting down
Node Id 4 Group 0 (0x0 ) Start phase 0 State: The node cannot be contacted
So during the full shutdown procedure the group id is still shown correctly, only when
the node is fully gone it changes to 0
Now i start the node with "ndbd --nostart"
Node Id 4 Group 1 (0x1 ) Start phase 0 State: The node's status is not known
Node Id 4 Group -1 (0xFFFFFFFF ) Start phase 0 State: The node has not yet executed the startup protocol
Funny enough the node seems to show the correct node group briefly
before it changes to -1
Now i issue "4 START":
Node Id 4 Group -202116109 (0xF3F3F3F3 ) Start phase 0 State: The node is executing the startup protocol
Node Id 4 Group -202116109 (0xF3F3F3F3 ) Start phase 2 State: The node is executing the startup protocol
Node Id 4 Group -202116109 (0xF3F3F3F3 ) Start phase 4 State: The node is executing the startup protocol
Node Id 4 Group 1 (0x1 ) Start phase 100 State: The node is executing the startup protocol
Node Id 4 Group 1 (0x1 ) Start phase 0 State: The node is running
So first the group changes from -1 to a much larger negative value
before it changes to the actual group id somewhere between start
phases 4 and 100. As i was running this on an empty test installation
my mgmapi program was not fast enough to capture all start phases,
i'll give it another try on a larger installation later to check
the exact phase when the transition happens.
How to repeat:
check ndb_mgm_get_status() results for node_group on a starting node,
e.g. using the attached simple mgmapi monitoring app
Suggested fix:
Either consistently return -1 for node_group while a node is not active part of any node group or document node_group as only being defined for fully started nodes (node_status being either NDB_MGM_NODE_STATUS_STARTED or
NDB_MGM_NODE_STATUS_SINGLEUSER)