Bug #24516 MGMAPI reports bad data for down datanodes
Submitted: 22 Nov 2006 18:44 Modified: 11 Apr 2007 12:05
Reporter: Anders Karlsson Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: NDB API Severity:S4 (Feature request)
Version:mysql-5.1 OS:Linux (Linux)
Assigned to: CPU Architecture:Any
Tags: 5.1.11

[22 Nov 2006 18:44] Anders Karlsson
Description:
When a datanode is down, mgmapi consistently reports bad data for those nodes. Both the connect_adress and node_group fields are off. In particular node_group reports any down node as being in group 0. If it is a fact that the group isn't known for a down node, although I cannot figure out why that would be the case, at least one would not expect a valid group id here (such as 0) but maybe -1. But really, the proper group-id should be there.
As for a down datanode, the node id for that is reported as "0.0.0.0" unless it has been connected at least once, and then brough down, when the nodeid used the last time will be seen.

How to repeat:
Bring up a minimal cluster witk a configuration similar to this (hostnames and path must be changed of course):
<CONFIG>
[NDBD DEFAULT]
NoOfReplicas=1

[NDB_MGMD]
Id=1
HostName=ned

[NDBD]
Id=2
HostName=ned
DataMemory=10M
DataDir=/usr/local/mysql-5.1.11-beta-linux-i686-glibc23/cluster1/node1

[NDBD]
Id=3
HostName=moe
DataMemory=10M
DataDir=/home/karlsson/mysql-5.1.11-beta-linux-i686-glibc23/cluster1/node1

[MYSQLD]
Id=4
</CONFIG>

Then start the mgm server and run this simple program that use the MGM API:
<CODE>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <mysql.h>
#include <mgmapi.h>

int main(int argc, char *argv[])
   {
   NdbMgmHandle hMgm;
   ndb_mgm_cluster_state *pState;
   int i;

   if(argc != 2)
      {
      fprintf(stderr, "Usage: %s <connect string>\n", argv[0]);
      return 1;
      }
   hMgm = ndb_mgm_create_handle();
   ndb_mgm_set_connectstring(hMgm, argv[1]);

   ndb_mgm_connect(hMgm, 1, 30, 1);

   for(;;)
      {
      pState = ndb_mgm_get_status(hMgm);

      printf("=====================\n");
      for(i = 0; i < pState->no_of_nodes; i++)
         {
         if(pState->node_states[i].node_type != NDB_MGM_NODE_TYPE_NDB)
            continue;
         printf("Node: id %d group: %d\n", pState->node_states[i].node_id,
           pState->node_states[i].node_group);
         printf("Node: %d address: /%s/\n", pState->node_states[i].node_id,
           pState->node_states[i].connect_address);
         }
      sleep(2);
      }
   return 0;
   } // End of main().
</CODE>

Running this program with just the mgmd running produces this output:
Node: id 2 group: 0
Node: 2 address: /0.0.0.0/
Node: id 3 group: 0
Node: 3 address: /0.0.0.0/
=====================
Node: id 2 group: 0
Node: 2 address: /0.0.0.0/
Node: id 3 group: 0
Node: 3 address: /0.0.0.0/
=====================

All of these are wrong of course. 0.0.0.0 is not a valid address, and there is nothing in the documentation that I can see that says that the connect_adress shoudl be invalid just because a node is down. And there is little reason for this also, as the HostName is specified in the config file. What is worse though, is that just 1 (valid) nodegroup is showing, where as 2 are defined, and none is active (that none of the nodes above is active is clear in the node_status field, which is valid it seems).

Starting the first datanode produce this output from the program above:
Node: id 2 group: 0
Node: 2 address: /192.168.0.22/
Node: id 3 group: 0
Node: 3 address: /0.0.0.0/
=====================

The connect_address of the first node is correct now, but the second is invalid and the group id of this is also invalid. Bringing the second datanode up shows this:
Node: id 2 group: 0
Node: 2 address: /192.168.0.22/
Node: id 3 group: 0
Node: 3 address: /192.168.0.11/
=====================
Node: id 2 group: 0
Node: 2 address: /192.168.0.22/
Node: id 3 group: 0
Node: 3 address: /192.168.0.11/
=====================
Node: id 2 group: 0
Node: 2 address: /192.168.0.22/
Node: id 3 group: 1
Node: 3 address: /192.168.0.11/
=====================

Note that during a period when the datanode is starting, the node_group field is still not valid! Now, bringing the second datanode down again will show this:
Node: id 2 group: 0
Node: 2 address: /127.0.0.1/
Node: id 3 group: 1
Node: 3 address: /192.168.0.11/
=====================
Node: id 2 group: 0
Node: 2 address: /127.0.0.1/
Node: id 3 group: 0
Node: 3 address: /192.168.0.11/
=====================

So when the node really is down, the ip address of the last connection is still there, and the group id is again invalid.

Node: id 2 group: 0
Node: 2 address: /192.168.0.22/
Node: id 3 group: 0
Node: 3 address: /0.0.0.0/
=====================

Suggested fix:
This has to be fixed, really. Being able to monitor a running Cluster is crucial. This gets worse when more nodes are involved. If there are 4 datanodes and 1 node in nodegroup 2 goes down, will will show up as there begin 3 nodes in nodegroup 1, 1 of which is down, and 1 node in nodegroup 2, which is NOT down. This is incorrect and confusing. In some cases, when connecting to localhost, I have sometimes seen 127.0.0.1 reported as the node, and sometimes the node address, but I have not been able to consistenly reproduce yet.

I would like to point out that this looks different from bug #24011, as in that case, there are issues with high load, whereas in the case documented here, this happens even with no load at all, and the node_group is consistently wrong.
[11 Apr 2007 12:05] Hartmut Holzgraefe
Whether the node_group information makes sense or not depends on the value of node_status, you should check that first before trying to read node_group. 
If e.g. "node_status == NDB_MGM_NODE_STATUS_NO_CONTACT" then the contents of node_group are meaningless.

connect_address keeps the IP of the last connect if NDB_MGM_NODE_STATUS_NO_CONTACT, if the node never connected before it is
0.0.0.0 and connect_count is zero.

While i agree that a node_group = -1 would make more sense in this case
i do not consider this a bug but a feature request only.