Bug #58240 Ndb : Multi-node failure handling in QMGR can send invalid signal
Submitted: 16 Nov 2010 17:49 Modified: 6 Jan 2011 6:09
Reporter: Frazer Clement Email Updates:
Status: Closed Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.3 OS:Any
Assigned to: Frazer Clement CPU Architecture:Any

[16 Nov 2010 17:49] Frazer Clement
When testing multi-node failure scenarios, noted problem with Failed nodeid received in signal PREP_FAILREQ.  Specifically there is a junk nodeid.

Adding more assertions reveals that the root of the problem is in CLOSE_COMCONF handling code in QMGR which stores the same nodeid in the c_failedNodes array twice.  This is then used to build a bitmap when passed to receivers of PREP_FAILREQ and they read junk as the bitmap has less set bits than the NumberOfFailedNodes indicates.

Code is not commented, but guess that :
  Node receives PREP_FAILREQ from president for node(s) A,B
    Node sends CLOSE_COMREQ to CMVMI
    Node receives CLOSE_COMCONF from CMVMI
  Node checks whether CLOSE_COMCONF included all failed nodes
    If so, send PREP_FAILCONF to president
    If not, send PREP_FAILREF to president - there's more failures for it to commit.

  'all failed nodes' is normally maintained by FAIL_REP handling.

It's not clear that this bug is serious, as generally it is seen in a situation where the cluster is likely to fail in any case (too many nodes have failed).

How to repeat:
Multi-node simultaneous failure situation (suggest 4 or more nodes).

Surviving nodes processing asynchronous failure notifications.

Sometimes QMGR asserts.

Suggested fix:
There's a dubious looking loop in Qmgr::execCLOSE_COMCONF() which starts from 1 for no known reason :

  if (tprepFailConf == ZFALSE) {
    for (Tindex = 1; Tindex < MAX_NDB_NODES; Tindex++) {
      cfailedNodes[Tindex] = cprepFailedNodes[Tindex];

This will copy prepFailedNodes to cfailedNodes, except for the first entry!  As far as I can see, this should start at 0.

Other dubious code here :
  cprepFailedNodes array overwritten with contents of CLOSE_COMREQ bitmask, and new size stored in local arrayIndex.  Then, later, cprepFailedNodes updated using cnoPrepFailedNodes as an index which appears unrelated to arrayIndex.

Suggest :
 1) Get agreement of semantics of this code
 2) Re-implement with comments.
[16 Nov 2010 17:53] Frazer Clement
Patch highlighting interesting areas

Attachment: bug58240.patch (text/x-patch), 1.32 KiB.

[19 Nov 2010 12:46] Jonas Oreland
is there a fix coming ?
[13 Dec 2010 10:02] Frazer Clement
Proposed patch against 6.3

Attachment: bug58240.patch (text/x-patch), 2.30 KiB.

[13 Dec 2010 10:03] Frazer Clement
Bug is more serious for larger node count clusters, where multi node failures are more likely, and more likely to leave enough nodes for a viable surviving cluster.  In this scenario, this bug could kill enough of the surviving nodes to make cluster survival impossible.
[13 Dec 2010 14:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:


3362 Frazer Clement	2010-12-13
      Bug#58240 Ndb : Multi-node failure handling in QMGR can send invalid signal
      Fix multi-node failure handling code to avoid invalid signal construction.
[13 Dec 2010 15:50] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.21 (revid:frazer@mysql.com-20101213153450-5czlh089vpqfo7v1) (version source revid:frazer@mysql.com-20101213153450-5czlh089vpqfo7v1) (merge vers: 5.1.51-ndb-7.0.21) (pib:23)
[13 Dec 2010 15:55] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.40 (revid:frazer@mysql.com-20101213152410-ozb61hffqx17fkex) (version source revid:frazer@mysql.com-20101213144826-vg323kt6tlb7656h) (merge vers: 5.1.51-ndb-6.3.40) (pib:23)
[20 Dec 2010 9:55] Jonas Oreland
pushed to 6.3.40, 7.0.21 and 7.1.10
[6 Jan 2011 6:09] Jon Stephens
Documented fix in the NDB-6.3.40, 7.0.21, and 7.1.10 changelogs as follows:

        When handling failures of multiple data nodes, an error in the
        construction of internal signals could cause the cluster&apos;s
        remaining nodes to crash. This issue was most likely to affect
        clusters with large numbers of data nodes.