Bug #48301 ndb_mgmd 'get status' shows confusing status for API and MGM nodes
Submitted: 26 Oct 11:40 Modified: 4 Nov 8:31
Reporter: Magnus Blaudd
Status: In progress
Category:Server: Cluster Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.3 OS:Any
Assigned to: Magnus Blaudd Target Version:6.3.30
Tags: 6.3.28
Triage: Triaged: D3 (Medium) / R6 (Needs Assessment) / E6 (Needs Assessment)

[26 Oct 11:40] Magnus Blaudd
Description:
The output from ndb_mgmd's get status is configusing for API and MG nodes

How to repeat:
node status
nodes: 5
node.3.type: NDB
node.3.status: STARTED
node.3.version: 458759
node.3.mysql_version: 327971
node.3.startphase: 0
node.3.dynamic_id: -1
node.3.node_group: 0
node.3.connect_count: 0
node.3.address: 129.159.118.181
node.4.type: NDB
node.4.status: STARTED
node.4.version: 458759
node.4.mysql_version: 327971
node.4.startphase: 0
node.4.dynamic_id: 2
node.4.node_group: 0
node.4.connect_count: 0
node.4.address: 129.159.118.182
node.1.type: MGM
node.1.status: NO_CONTACT
node.1.version: 458759
node.1.mysql_version: 327971
node.1.startphase: 0
node.1.dynamic_id: 0
node.1.node_group: 0
node.1.connect_count: 0
node.1.address: 129.159.118.181
node.2.type: MGM
node.2.status: UNKNOWN
node.2.version: 458759
node.2.mysql_version: 327971
node.2.startphase: 0
node.2.dynamic_id: -1
node.2.node_group: -1
node.2.connect_count: 0
node.2.address: 129.159.118.182
node.5.type: API
node.5.status: NO_CONTACT
node.5.version: 458759
node.5.mysql_version: 327971
node.5.startphase: 0
node.5.dynamic_id: 0
node.5.node_group: 0
node.5.connect_count: 0
node.5.address: 129.159.118.182

node.1.state: STARTED 
   I know for sure I'm here...

node.2.status: STARTED
node.5.state = STARTED
   I have these two members addresses and version info, so they should be here?

Suggested fix:
* NDB nodes, are connected directly with transporter
  -> use the status as seen by local ClusterMgr, get it by locking TTFM
     (the transporter facade mutex) and copy nodeinfo to MgmtSrvr.
  -> use the "connect_address" from transporter, will be cached
     in MgmtSrvr when 'handleStatus' is called to notify that
     the node has connected(with TTFM locked) and cleared when
     node disconnects. Thus we should have an address when ClusterMgr
     says connected.

* API nodes, are not connected to MGM
  -> ask any connected NDB node which version and address is being used.
     (preferably ask the NDB node that report it has connection to
     the node - NodeState::m_connected_nodes(if API nodes are listed))
  -> return status UNKNOWN if no connection with and NDB node.
  -> return status NO_CONTACT or CONNECTED if reply from NDB node
     is recieved

node.5.type: API
node.5.status: UNKNOWN/NO_CONTACT/CONNECTED
node.5.version: 458759                       /* Set when STARTED */
node.5.mysql_version: 327971                 /* Set when STARTED */
node.5.startphase: 0                         /* Always 0 */
node.5.dynamic_id: 0                         /* Always 0 */
node.5.node_group: 0                         /* Always 0 */
node.5.connect_count: 0                      /* Always 0 */
node.5.address: 129.159.118.182              /* Set when STARTED */

* MGM nodes
  -> use special case when asking about own node. Ask any connected NDB
     node which address the MGM has connected from. Use HostName from
     config if no NDB node available. Hardcode version(maybe check it's
     the same as expected) .

  -> for other MGM nodes
    - 6.3, not connected directly to MGM, use same method as API node
      (see above).
    - 7.0, connected directly with transporter, use same method as
      NDB node(see above). We will thus also get connect_count.

node.1.type: MGM
node.1.status: UNKNOWN/NO_CONTACT/CONNECTED  /* No UNKNOWN in 7.0 */
node.1.version: 458759                       /* Set when STARTED */
node.1.mysql_version: 327971                 /* Set when STARTED */
node.1.startphase: 0                         /* Always 0 */
node.1.dynamic_id: 0                         /* Always 0 */
node.1.node_group: 0                         /* Always 0 */
node.1.connect_count: 1                      /* Working from 7.0 */
node.1.address: 129.159.118.181              /* Set when STARTED */
[2 Nov 13:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/88928