Bug #48301 ndb_mgmd 'get status' shows confusing status for API and MGM nodes
Submitted: 26 Oct 2009 10:40 Modified: 21 Jun 2011 16:30
Reporter: Magnus Blåudd Email Updates:
Status: Closed Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.3 OS:Any
Assigned to: Magnus Blåudd CPU Architecture:Any
Tags: 6.3.28

[26 Oct 2009 10:40] Magnus Blåudd
The output from ndb_mgmd's get status is configusing for API and MG nodes

How to repeat:
node status
nodes: 5
node.3.type: NDB
node.3.status: STARTED
node.3.version: 458759
node.3.mysql_version: 327971
node.3.startphase: 0
node.3.dynamic_id: -1
node.3.node_group: 0
node.3.connect_count: 0
node.4.type: NDB
node.4.status: STARTED
node.4.version: 458759
node.4.mysql_version: 327971
node.4.startphase: 0
node.4.dynamic_id: 2
node.4.node_group: 0
node.4.connect_count: 0
node.1.type: MGM
node.1.status: NO_CONTACT
node.1.version: 458759
node.1.mysql_version: 327971
node.1.startphase: 0
node.1.dynamic_id: 0
node.1.node_group: 0
node.1.connect_count: 0
node.2.type: MGM
node.2.status: UNKNOWN
node.2.version: 458759
node.2.mysql_version: 327971
node.2.startphase: 0
node.2.dynamic_id: -1
node.2.node_group: -1
node.2.connect_count: 0
node.5.type: API
node.5.status: NO_CONTACT
node.5.version: 458759
node.5.mysql_version: 327971
node.5.startphase: 0
node.5.dynamic_id: 0
node.5.node_group: 0
node.5.connect_count: 0

node.1.state: STARTED 
   I know for sure I'm here...

node.2.status: STARTED
node.5.state = STARTED
   I have these two members addresses and version info, so they should be here?

Suggested fix:
* NDB nodes, are connected directly with transporter
  -> use the status as seen by local ClusterMgr, get it by locking TTFM
     (the transporter facade mutex) and copy nodeinfo to MgmtSrvr.
  -> use the "connect_address" from transporter, will be cached
     in MgmtSrvr when 'handleStatus' is called to notify that
     the node has connected(with TTFM locked) and cleared when
     node disconnects. Thus we should have an address when ClusterMgr
     says connected.

* API nodes, are not connected to MGM
  -> ask any connected NDB node which version and address is being used.
     (preferably ask the NDB node that report it has connection to
     the node - NodeState::m_connected_nodes(if API nodes are listed))
  -> return status UNKNOWN if no connection with and NDB node.
  -> return status NO_CONTACT or CONNECTED if reply from NDB node
     is recieved

node.5.type: API
node.5.version: 458759                       /* Set when STARTED */
node.5.mysql_version: 327971                 /* Set when STARTED */
node.5.startphase: 0                         /* Always 0 */
node.5.dynamic_id: 0                         /* Always 0 */
node.5.node_group: 0                         /* Always 0 */
node.5.connect_count: 0                      /* Always 0 */
node.5.address:              /* Set when STARTED */

* MGM nodes
  -> use special case when asking about own node. Ask any connected NDB
     node which address the MGM has connected from. Use HostName from
     config if no NDB node available. Hardcode version(maybe check it's
     the same as expected) .

  -> for other MGM nodes
    - 6.3, not connected directly to MGM, use same method as API node
      (see above).
    - 7.0, connected directly with transporter, use same method as
      NDB node(see above). We will thus also get connect_count.

node.1.type: MGM
node.1.status: UNKNOWN/NO_CONTACT/CONNECTED  /* No UNKNOWN in 7.0 */
node.1.version: 458759                       /* Set when STARTED */
node.1.mysql_version: 327971                 /* Set when STARTED */
node.1.startphase: 0                         /* Always 0 */
node.1.dynamic_id: 0                         /* Always 0 */
node.1.node_group: 0                         /* Always 0 */
node.1.connect_count: 1                      /* Working from 7.0 */
node.1.address:              /* Set when STARTED */
[2 Nov 2009 12:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

[21 Jun 2011 16:30] Jon Stephens
Documented as follows in the NDB 7.0.26 and 7.1.15 changelogs:

        Multiple management servers were unable to see one another until
        all nodes had fully started. As part of the fix for this
        issue, two new status values RESUME and CONNECTED are reported
        for management nodes in the output of the ndb_mgm client SHOW
        command. Two corresponding values NDB_MGM_NODE_STATUS_RESUME and
        NDB_MGM_NODE_STATUS_CONNECTED are also added to the list of
        possible values for an ndb_mgm_node_status data structure in the
        MGM API.

Also updated relevant info in descriptions of client commands and MGM API.