Bug #44248 Identifying nodes waited for from cluster log requires mental gymnastics.
Submitted: 13 Apr 2009 22:12 Modified: 12 Jun 2009 13:08
Reporter: Matthew Montgomery Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:Any
Assigned to: jack andrews CPU Architecture:Any

[13 Apr 2009 22:12] Matthew Montgomery
Description:
Example log.

 [MgmSrvr] INFO     -- Node 10: Initial start, waiting for 0000000000003000 to connect,  nodes [ all: 0000000000003c00 connected: 0000000000000c00 no-wait: 0000000000000000 ]

To determine the node you are waiting for you must know now many nodes are in the cluster and what their ids are.

all: H'0000000000003c00 = 15360 = 2^10 + 2^11 + 2^12 + 2^13 
connected: H'0000000000000c00 = 3072 = 2^10 + 2^11
waiting for: H'0000000000003000 = 12288 = 2^12 + 2^13

So log reveals nodes 10, 11 are up but 12 and 13 are not.

People generally aren't so good at hex math, especially when stressed and trying to restart a crashed cluster.  Having to do these brain teasers isn't so good for an admin trying to quickly restore service.

How to repeat:
look at the log of a starting cluster.

Suggested fix:
Make the computer do the math and report easily readable server ids, that's what they are good at.
[16 Apr 2009 12:24] jack andrews
<magnus>	http://bugs.mysql.com/bug.php?id=44248
<magnus>	it sounds hard, but it's actually about adding a "pretty printer to NdbBitmask"
<magnus>	I did something similar in mgmd, but it would be good with a generic print function.
<jack>	yup - that looks good.
<magnus>	in src/mgmsrv/ConfigManager.cpp there is a funtion 'nodes2str' 
<magnus>	should probably be a function in NodeBitmask class (or a helper function) so that you can convert a NodeBitmask to a readable string
[16 Apr 2009 12:25] Magnus Blåudd
It sounds hard, but it's actually about adding a "pretty printer to NodeBitmask". In src/mgmsrv/ConfigManager.cpp there is a similar funtion 'nodes2str'. This should probably be a function in NodeBitmask class (or a helper function) so that you can convert a NodeBitmask to a readable string and then print it out.
[16 Apr 2009 12:29] Jonas Oreland
an extra complication is that stuff that is sent as infoEvents
have a max len of 96 bytes. Which means that the current printing is compact
enough to fit...where as a 1,2,3...,255 will not fit in 96 characters...

note: this is only a problem for events generated by infoEvent()/warningEvent()
[23 Apr 2009 12:28] jack andrews
the patch below prints this for a 2 node cluster:

2009-04-23 22:24:56 [MgmSrvr] INFO     -- Node 2: Waiting 7 sec for nodes {3} to
 connect, nodes [ all: {2,3} connected: {2} no-wait: {} ]

i tried with 33 ndbd nodes to test for the case where there's more than one int containing the bitmap, but ndb_mgmd crashed and i can't make it work...

  int eos = 0;
  buf[eos++] = '{';
  for (int i = (size-1); i >= 0; i--) {
    Uint32 x = data[i];
    for (int j = 0; j < 32; j++) {
      if (x & 1)
        eos += sprintf(buf + eos,"%s%d", eos>1?",":"",i*32+j);
      x >>= 1;
    }
  }
  buf[eos++] = '}';
  buf[eos] = 0;
[28 Apr 2009 8:23] Magnus Blåudd
Hi,

1. function should use the member functions of Bitmask to find out which bits are set or not. See 'nodes2str' in ConfigManager.cpp

2. There is already a 'Bitmask::getText' function, I suggest naming this 'getPrettyText'

3. The function should take a pointer to the buffer where to print and also the length of that buffer. Should the pretty format exceed the length of that buffer, the function should revert to the default representation.

4. Suggest that we add a unit test for Bitmask, look at BaseString's unit test is compiled in src/common/util/BaseString-t and rtun automatically by Pushbuild or by running "make test-unit" at the top level directory.
[29 Apr 2009 9:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72987

2871 jack andrews	2009-04-29
      bug#44248  Identifying nodes waited for from cluster log requires mental gymnastics.
[25 May 2009 13:39] jack andrews
if i follow the example of nodes2str, BitmaskImpl::getPrettyText() would have a different interface to BitmaskImpl::getText().  that is, unless i construct a bitmask from the parameters - they are:
  (unsigned size, const Uint32 data[], char* buf)
[2 Jun 2009 14:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/75469

2935 jack andrews	2009-06-02
      Bug #44248	Identifying nodes waited for from cluster log requires mental gymnastics.
      this commit adds _non_static_ methods to Bitmask for printing:
         * getText: 000000000000000000000000000000000000000000000000000000000015d753
         * getPrettyText:      0, 1, 4, 6, 8, 9, 10, 12, 14, 15, 16, 18 and 20
         * getPrettyTextShort: 0,1,4,6,8,9,10,12,14,15,16,18,20
      all methods take no parameters and return a BaseString
[3 Jun 2009 13:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/75527

2936 jack andrews	2009-06-03
      Bug #44248	Identifying nodes waited for from cluster log requires mental gymnastics.
      this commit adds _non_static_ methods to Bitmask for printing:
           * getText: 000000000000000000000000000000000000000000000000000000000015d753
           * getPrettyText:      0, 1, 4, 6, 8, 9, 10, 12, 14, 15, 16, 18 and 20
           * getPrettyTextShort: 0,1,4,6,8,9,10,12,14,15,16,18,20
      all methods take no parameters and return a BaseString
[3 Jun 2009 13:41] Bugs System
Pushed into 5.1.34-ndb-7.0.7 (revid:jack@sun.com-20090603133750-mclmuz8pakjz7ry2) (version source revid:jack@sun.com-20090603132259-64y6rivk7y1izl7a) (merge vers: 5.1.34-ndb-7.0.7) (pib:6)
[3 Jun 2009 15:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/75538

2938 jack andrews	2009-06-03
      Bug #44248  Identifying nodes waited for from cluster log requires mental gymnastics.
      integrating last commit with EventLogger.cpp to pretty print nodes
[4 Jun 2009 10:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/75597
[4 Jun 2009 10:26] Bugs System
Pushed into 5.1.34-ndb-7.0.7 (revid:magnus.blaudd@sun.com-20090604102602-ko3pg4plm3e5mpzd) (version source revid:magnus.blaudd@sun.com-20090604102602-ko3pg4plm3e5mpzd) (merge vers: 5.1.34-ndb-7.0.7) (pib:6)
[10 Jun 2009 10:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/75998

2940 jack andrews	2009-06-10
      Bug #44248  	Identifying nodes waited for from cluster log requires mental gymnastics.
      
      changed EventLogger to use new BaseString::getPrettyText() 
      removed nodes2str() from ConfigManager.cpp and now use getPrettyText()
[10 Jun 2009 12:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/76005

2941 jack andrews	2009-06-10
      Bug #44248  	Identifying nodes waited for from cluster log requires mental gymnastics.
      
        in EventLogger::   removed arrays and replaced with 4 vars.
[10 Jun 2009 12:58] Magnus Blåudd
Ok to push
[10 Jun 2009 13:26] Bugs System
Pushed into 5.1.34-ndb-7.0.7 (revid:jack@sun.com-20090610120932-czawv5uismbbb20w) (version source revid:jack@sun.com-20090610120932-czawv5uismbbb20w) (merge vers: 5.1.34-ndb-7.0.7) (pib:6)
[10 Jun 2009 13:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/76021

2942 jack andrews	2009-06-10
      Bug #44248  	Identifying nodes waited for from cluster log requires mental gymnastics.
      
        fix for Bitmask-t where there are two 
          template struct BitmaskPOD<8>
        defined
[10 Jun 2009 13:33] Bugs System
Pushed into 5.1.34-ndb-7.0.7 (revid:jack@sun.com-20090610133156-sbqone27z82d0kgx) (version source revid:jack@sun.com-20090610133156-sbqone27z82d0kgx) (merge vers: 5.1.34-ndb-7.0.7) (pib:6)
[12 Jun 2009 13:08] Jon Stephens
Documented in the NDB-7.0.7 changelog as follows:

        Formerly, node IDs were represented in the cluster log using a
        complex hexadecimal/binary encoding. Now, node IDs are reported
        in the cluster log using numbers in standard decimal notation.