Bug #5234 Wisconsin benchmark crash, sendSignalErrorRefuseLab
Submitted: 26 Aug 2004 16:26 Modified: 28 Sep 2004 7:04
Reporter: Magnus Blåudd Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql 4.1.4 OS:
Assigned to: Jonas Oreland CPU Architecture:Any

[26 Aug 2004 16:26] Magnus Blåudd
Description:
ndbd crashes at ndbrequire in  function Dbtc::sendSignalErrorRefuseLab. Probably due to that a part of the transaction has timed out.

void Dbtc::sendSignalErrorRefuseLab(Signal* signal) 
{
  ndbassert(false);
  ptrGuard(apiConnectptr);
  if (apiConnectptr.p->apiConnectstate != CS_DISCONNECTED) {
    jam();
    ndbrequire(false); <<=================
    signal->theData[0] = apiConnectptr.p->ndbapiConnect;
    signal->theData[1] = signal->theData[ttransid_ptr];
    signal->theData[2] = signal->theData[ttransid_ptr + 1];
    signal->theData[3] = ZSIGNAL_ERROR;
    sendSignal(apiConnectptr.p->ndbapiBlockref, GSN_TCROLLBACKREP, 
               signal, 4, JBB);
  }
}//Dbtc::sendSignalErrorRefuseLab()

How to repeat:
> ./test-wisconsin --create-options=type=ndb
> --socket=/usr/local/mysql/var/mysql.sock
> 
> Testing server 'MySQL 4.1.4 gamma/' at 2004-08-24 22:02:53
>                                                                                                
> Wisconsin benchmark test
>                                                                                                
> Time for create_table (3):  9 wallclock secs ( 0.00 usr  0.00 sys + 
> 0.00 cusr  0.00 csys =  0.00 CPU)
>                                                                                                
> Inserting data
> Time to insert (31000): 172 wallclock secs ( 3.07 usr  1.45 sys +  0.00
> cusr  0.00 csys =  4.52 CPU)
> Time to delete_big (1):  7 wallclock secs ( 0.00 usr  0.00 sys +  0.00
> cusr  0.00 csys =  0.00 CPU)
>                                                                                                
> Running the actual benchmark
> Error occured with execute(select t.*,B.unique1 AS Bunique1,B.unique2 AS
> Bunique2,B.two AS Btwo,B.four AS Bfour,B.ten AS Bten,B.twenty AS
> Btwenty,B.hundred AS Bhundred,B.thousand AS Bthousand,B.twothousand AS
> Btwothousand,B.fivethous AS Bfivethous,B.tenthous AS Btenthous,B.odd AS
> Bodd,B.even AS Beven,B.stringu1 AS Bstringu1,B.stringu2 AS
> Bstringu2,B.string4 AS Bstring4  from tenk1 t, Bprime B where t.unique2
> = B.unique2)
>  -> Can't lock file (errno: 4009)
> 
> 
> 

ndb_3_error.log:
Date/Time: Tuesday 24 August 2004 - 22:18:28
Type of error: error
Message: Internal program error (failed ndbrequire)
Fault ID: 2341
Problem data: DbtcMain.cpp
Object of reference: DBTC (Line: 1320) 0x0000000a
ProgramName: NDB Kernel
ProcessID: 4097
TraceFile: ndb_3_trace.log.1
***EOM***

And same with ndb_2_error.log
Date/Time: Tuesday 24 August 2004 - 22:18:26
Type of error: error
Message: Internal program error (failed ndbrequire)
Fault ID: 2341
Problem data: DbtcMain.cpp
Object of reference: DBTC (Line: 1320) 0x0000000a
ProgramName: NDB Kernel
ProcessID: 24307
TraceFile: ndb_2_trace.log.1
***EOM***

The trace file ndb_3_trace.log.1

DBTC    003892 
DBTUP   002032 
DBTC    010991 011937 
DBTC    011178 011264 011201 
DBTC    002383 002455 001391 001319 001320 

--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 1985614 gsn: 523 "INDXATTRINFO"
prio: 1
s.bn: 32769 "API", s.proc: 11, s.sigId: 0 length: 14 trace: 1 #sec: 0
fragInf: 0
 H'00000020 H'0000797e H'00100b00 H'00050000 H'00060000 H'00070000
H'00080000 H'00090000 H'000a0000 H'000b0000 H'000c0000 H'000d0000
H'000e0000 H'000f0000
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 1985613 gsn: 519 "TCINDXREQ" prio:
1
s.bn: 32769 "API", s.proc: 11, s.sigId: 0 length: 14 trace: 1 #sec: 0
fragInf: 0
 apiConnectPtr: H'00000020, senderData: H'00008cbc
 Operation: Read, Flags: Start Execute 
 indexLen: 1, attrLen: 16, AI in this: 5, indexId: 46, indexSchemaVer:
1, API Ver: 0
 transId(1, 2): (H'0000797e, H'00100b00)
 -- Variable Data --
 H'00000100 H'00000000 H'00010000 H'00020000 H'00030000 H'00040000
--------------- Signal ----------------

Suggested fix:
Handle problem without crashing.