Bug #47039 Ndb : Signals from failed API node received after API_FAILREQ
Submitted: 1 Sep 2009 10:44 Modified: 9 Sep 2009 14:33
Reporter: Frazer Clement Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.2 OS:Any
Assigned to: Frazer Clement CPU Architecture:Any

[1 Sep 2009 10:44] Frazer Clement
Description:
Testing fixes for bug#44607 : Fragmented long signals need node failure handling, it became apparent that sometimes Signals from a failed API node are received *after* an API_FAILREQ signals has been received for the node.

This complicates API node failure handling, as cleanup work performed as a result of the API_FAILREQ can result in resources being in invalid states for processing the following signals.

Changes should be made to ensure that all pending signals from a failing API node are processed before an API_FAILREQ signal is received.

How to repeat:
1) Start API node sending significant volume of requests (e.g. batched inserts or scans with large AttrInfo).

2) Perform disconnect of API node

3) Monitor received signal sequence at TC or DICT to determine whether signals are received from the API after an API_FAILREQ signal.

Suggested fix:
Change QMGR/CMVMI handling of disconnect to ensure API_FAILREQ sequencing.
[1 Sep 2009 10:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82097

2972 Frazer Clement	2009-09-01
      Bug#47039 : Signals from failed API node received after API_FAILREQ
      modified:
        storage/ndb/include/kernel/signaldata/CloseComReqConf.hpp
        storage/ndb/src/kernel/blocks/ERROR_codes.txt
        storage/ndb/src/kernel/blocks/cmvmi/Cmvmi.cpp
        storage/ndb/src/kernel/blocks/dbtc/Dbtc.hpp
        storage/ndb/src/kernel/blocks/dbtc/DbtcMain.cpp
        storage/ndb/src/kernel/blocks/qmgr/Qmgr.hpp
        storage/ndb/src/kernel/blocks/qmgr/QmgrMain.cpp
        storage/ndb/test/ndbapi/testNdbApi.cpp
        storage/ndb/test/run-test/daily-devel-tests.txt
[1 Sep 2009 11:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82098

2973 Frazer Clement	2009-09-01
      Bug#47039 : Signals from failed API node received after API_FAILREQ.  Move test to daily-basic
      modified:
        storage/ndb/test/run-test/daily-basic-tests.txt
        storage/ndb/test/run-test/daily-devel-tests.txt
[1 Sep 2009 11:47] Bugs System
Pushed into 5.1.35-ndb-7.1.0 (revid:frazer@mysql.com-20090901114544-8bzgs4hwrxxz4nld) (version source revid:frazer@mysql.com-20090901114544-8bzgs4hwrxxz4nld) (merge vers: 5.1.35-ndb-7.1.0) (pib:11)
[1 Sep 2009 11:48] Bugs System
Pushed into 5.1.37-ndb-7.0.8 (revid:frazer@mysql.com-20090901114204-lkvivzpab4zikww9) (version source revid:frazer@mysql.com-20090901114204-lkvivzpab4zikww9) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[1 Sep 2009 11:49] Bugs System
Pushed into 5.1.37-ndb-6.3.27 (revid:frazer@mysql.com-20090901110711-831d20o4nbl4kvdm) (version source revid:frazer@mysql.com-20090901110711-831d20o4nbl4kvdm) (merge vers: 5.1.37-ndb-6.3.27) (pib:11)
[1 Sep 2009 11:50] Bugs System
Pushed into 5.1.37-ndb-6.2.19 (revid:frazer@mysql.com-20090901110411-wgux1fql6dcvtz87) (version source revid:frazer@mysql.com-20090901110411-wgux1fql6dcvtz87) (merge vers: 5.1.37-ndb-6.2.19) (pib:11)
[1 Sep 2009 17:53] Frazer Clement
Proposed patch to mysql-5.1-telco-7.0 + for ndbmtd specifics

Attachment: bug47039-ndbmtd.patch (text/x-patch), 10.27 KiB.

[2 Sep 2009 12:05] Jonas Oreland
the extra ref was just too ugly...
how does it look when instead sending several ROUTE_ORD

(sorry for being picky)
[3 Sep 2009 11:09] Frazer Clement
Proposed patch without ROUTE_ORD modifications

Attachment: bug47039-ndbmtd2.patch (text/x-patch), 7.21 KiB.

[3 Sep 2009 14:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82334

2983 Frazer Clement	2009-09-03
      Bug#47039 : Ndbmtd specific fix.  Route API_FAILREQ via CMVMI to ensure correct ordering
      modified:
        storage/ndb/src/kernel/blocks/cmvmi/Cmvmi.cpp
        storage/ndb/src/kernel/blocks/cmvmi/Cmvmi.hpp
        storage/ndb/src/kernel/blocks/qmgr/Qmgr.hpp
        storage/ndb/src/kernel/blocks/qmgr/QmgrMain.cpp
[8 Sep 2009 13:03] Frazer Clement
Fix pushed to 6.2.19, 6.3.27, 7.0.8
[9 Sep 2009 14:33] Jon Stephens
Documented bugfix in the NDB-6.2.19, 6.3.27, and 7.0.8 changelogs, as follows:

        Signals from a failed API node could be received after an
        API_FAILREQ signal (see "The NDB Protocol: Operations and Signals")
        has been received from that node, which could result in invalid
        states for processing subsequent signals. Now, all pending
        signals from a failing API node are processed before any
        API_FAILREQ signal is received.

Closed.