Bug #77716 InvalidAttrInfo returned from execTUX_BOUND_INFO while scanning
Submitted: 14 Jul 2015 9:06 Modified: 22 Mar 2016 19:16
Reporter: Magnus Blåudd Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.5.0 OS:Any
Assigned to: CPU Architecture:Any

[14 Jul 2015 9:06] Magnus Blåudd
Description:
Data node crashes in Suma during what seems to be some form of scan

2015-07-13 15:06:11 [ndbd] INFO     -- g:\ade\build\sb_0-15888083-1436783526.18\mysql-cluster-gpl-7.5.0\storage\ndb\src\kernel\blocks\suma\suma.cpp
2015-07-13 15:06:11 [ndbd] INFO     -- SUMA (Line: 3156) 0x00000002
2015-07-13 15:06:11 [ndbd] INFO     -- Error handler shutting down system
2015-07-13 15:06:11 [ndbd] INFO     -- Error handler shutdown completed - exiting

Suma recieves a SCAN_FRAG_REF from Dblqh, with the last word being the error code set to 4110 which is TuxBoundInfo::InvalidAttrInfo.

--------------- Signal ----------------
r.bn: 257 "SUMA", r.proc: 2, r.sigId: 17622 gsn: 352 "SCAN_FRAGREF" prio: 1
s.bn: 247/1 "DBLQH", s.proc: 2, s.sigId: 24326 length: 4 trace: 0 #sec: 0 fragInf: 0
 H'00000000 H'00000000 H'10100200 H'0000100e

This signal is sent from Dbqlh after having performed a "direct call" to c_tux->execTUX_BOUND_INFO(signal) which then runs the code to set error:

    if (unlikely(offset != boundLen)) {
      jam();
      scan.m_errorCode = TuxBoundInfo::InvalidAttrInfo;
      req->errorCode = scan.m_errorCode;
      return;
    }

which can be seen by signal trace:

---> signal
DblqhMain.cpp        11274 22253 22253 11456 11464 12791 
DbtuxScan.cpp        00033 
DbtuxGen.cpp         00400 00400 00402 
DblqhMain.cpp        11617 
DbtuxScan.cpp        00171 00204 00216 00231 00244 00216 00231 00244 00216 
                     00265 <<<<<<<
DblqhMain.cpp        11759 11761 11774 
DbtupStoredProcDef.cpp 00038 00061 
DblqhMain.cpp        11801 13379 
DbtuxScan.cpp        00340 00356 00443 00490 
DbtuxSearch.cpp      00230 00248 00253 00256 00230 00248 00253 00302 00313 
                     00302 00313 00302 00313 00348 
DbtuxScan.cpp        00799 01086 00806 00633 
DblqhMain.cpp        10376 12088 
DbtuxScan.cpp        00340 00384 00387 
DbtuxNode.cpp        00598 00602 
DblqhMain.cpp        10431 
DbtupStoredProcDef.cpp 00038 00084 
DblqhMain.cpp        12606 13019 13052 07875 09026 
DbtuxScan.cpp        00437 
DblqhMain.cpp        03706 
DbtupBuffer.cpp      00035 
DblqhMain.cpp        03706 
DbtupBuffer.cpp      00035 

--------------- Signal ----------------
r.bn: 261/1 "PGMAN", r.proc: 2, r.sigId: 303993 gsn: 761 "STOP_FOR_CRASH" prio: 0
s.bn: 0 "SYS", s.proc: 0, s.sigId: 0 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000000
--------------- Signal ----------------
r.bn: 247/1 "DBLQH", r.proc: 2, r.sigId: 303992 gsn: 353 "SCAN_FRAGREQ" prio: 1
s.bn: 257 "SUMA", s.proc: 2, s.sigId: 6044 length: 12 trace: 0 #sec: 2 fragInf: 0
 senderData: 0x0
 resultRef: 0x1010002
 savePointId: 0
 flags: hdr attrLen: 0 reorg: 0 corr: 0 stat: 0 ni: 0
 tableId: 6
 fragmentNo: 1
 keyLen: 0
 schemaVersion: 0x1
 transId1: 0x0
 transId2: 0x10100200
 clientOpPtr: 0x0
 batch_size_rows: 16
 batch_size_bytes: 0

How to repeat:
Only reproducable on Windows.
Various testcases fails with data node crash in same place.
Only seen for  7.5
Seems to be a lot of "Ndb kernel thread 4 is stuck in: Job Handling elapsed=101" and "Watchdog: Warning overslept 251 ms, expected 100 ms." which perhaps causing signal reordering?

Suggested fix:
.
[22 Mar 2016 19:16] Jon Stephens
DOcumented fix in the NDB 7.4.11 and 7.5.2 changelogs, as follows:

    Performing ANALYZE TABLE on a table having one or more indexes
    caused ndbmtd to fail with an InvalidAttrInfo error due to signal 
    corruption. This issue occurred consistently on Windows, but could 
    also be encountered on other platforms.

Closed.