Bug #10058 | ndb_select_count crashes cluster (in dbtup) after system restart | ||
---|---|---|---|
Submitted: | 21 Apr 2005 12:13 | Modified: | 13 Jun 2005 15:02 |
Reporter: | Johan Andersson | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | 4.1,5.0 | OS: | Linux (RHEL 4 (64-bit opteron)) |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[21 Apr 2005 12:13]
Johan Andersson
[21 Apr 2005 12:21]
Johan Andersson
Mailing test program separately due to silly 200K limit.
[21 Apr 2005 12:25]
Johan Andersson
trace
Attachment: ndb_5_dbtup-bug-1-2.zip (application/x-zip-compressed, text), 91.19 KiB.
[21 Apr 2005 12:35]
Johan Andersson
Has the system restart corrupted the data?
[21 Apr 2005 13:56]
Martin Skold
Same crash seen in bug#10001, was this also after a system restart? Date/Time: Thursday 21 April 2005 - 05:37:41 Type of error: error Message: Pointer too large Fault ID: 2306 Problem data: DbtupExecQuery.cpp Object of reference: DBTUP (Line: 604) 0x0000000a ProgramName: /opt/atse/cluster/mysql/bin/ndbd ProcessID: 10876 TraceFile: /opt/atse/cluster/data_ndb/ndb_2_trace.log.1 ***EOM***
[21 Apr 2005 14:19]
Martin Skold
Also is this related to the large number of records? Have you tried with a smaller database?
[21 Apr 2005 14:22]
Martin Skold
} else if ((loopOpPtr.p->optype == ZDELETE) && (loopOpPtr.p->prevActiveOp == RNIL)) { jam(); //---------------------------------------------------------------------- // There was only a delete. The original tuple still is ok. //---------------------------------------------------------------------- } else { jam(); //---------------------------------------------------------------------- // There was another operation after the delete, this must be an insert // and we have found our copy tuple there. //---------------------------------------------------------------------- loopOpPtr.i = loopOpPtr.p->prevActiveOp; ptrCheckGuard(loopOpPtr, cnoOfOprec, operationrec); <== crashes here Could it be that is is a DELETE, but prevActiveOp is not set to RNIL correctly during system restart?
[22 Apr 2005 5:51]
Martin Skold
Need table definition
[22 Apr 2005 8:09]
Johan Andersson
We have reproduced this with 1M rows in db (small) and 50M rows in db (large).
[22 Apr 2005 8:35]
Johan Andersson
Yes it was after a system restart. Is data corrupt is one of the questions...
[9 Jun 2005 5:26]
Jonas Oreland
Pushed to 4.1.13 and 5.0.8
[13 Jun 2005 15:02]
Jon Stephens
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant product(s). Additional info: Documented in Change History for versions 4.1.13, 5.0.8.