Bug #39575 | Falcon crashes during loading of large dataset | ||
---|---|---|---|
Submitted: | 22 Sep 2008 7:27 | Modified: | 9 Jan 2009 14:05 |
Reporter: | Olav Sandstå | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) |
Version: | 6.0.7 | OS: | Solaris (Solaris 10) |
Assigned to: | Olav Sandstå | CPU Architecture: | Any |
Tags: | F_PLATFORM |
[22 Sep 2008 7:27]
Olav Sandstå
[22 Sep 2008 7:38]
Olav Sandstå
By looking at the call stack and the source code it seems like there migth be several different problems/bugs here: 1. The crash seems to occur due to Falcon throwing an exception that seems to not be handled/catched anywhere in the following code in falcon/storage/SerialLog.cpp: void SerialLog::startRecord() { ASSERT(!recovering); if (writeError) throw SQLError(IO_ERROR, "Previous I/O error on serial log prevents further processing"); if (writePtr == writeBlock->data) putVersion(); recordStart = writePtr; recordIncomplete = true; } 2. This happens since the writeError varialble have been set to true. This happens probably as a result of a write operation returning error. Which again probably (based on seeing the size of the serial log file being 4 GB) is due to some "4 GB file limitation". I think this is a bug too (unless we have a hard limit here - I have not checked yet). 3. Difference in serial log file size on Sun Studio and gcc: sounds strange and should be investigated
[26 Sep 2008 9:37]
Olav Sandstå
While trying to reproduce this crash I run into an ASSERT that also occurred just about when the serial log file had reached 4 GB. This crash is reported as bug 39661. It may be caused by the same problem as in this crash.
[30 Sep 2008 7:16]
Olav Sandstå
I am able to reproduce this crash by configuring Falcon with a 14 GB record cache and 8 GB page cache (it can probably also be reproduced with lower values than this) and then having a client inserting records of 1004 bytes as a single transaction until the crash occurs. The crash happens when the serial log file has reached 4 GB. Verified running on Solaris 10 with Falcon compiled with Sun Studio 12. I will attach my test client and the call stack from the crash.
[30 Sep 2008 7:23]
Olav Sandstå
Test client that inserts 100 million records of 1004 bytes in a single transaction
Attachment: LoadData.java (text/x-java), 4.03 KiB.
[30 Sep 2008 7:25]
Olav Sandstå
Call stack from the crash: current thread: t@24 [1] __lwp_kill(0x0, 0x6, 0xffffffff7dceef40, 0x1015bbb1c, 0xffffffff7d205400, 0x4), at 0xffffffff7dbce4b0 =>[2] my_write_core(sig = 6), line 307 in "stacktrace.c" [3] handle_segfault(sig = 6), line 2660 in "mysqld.cc" [4] __sighndlr(0x6, 0x0, 0xffffffff7aa3ddf0, 0x1003e5748, 0x0, 0x0), at 0xffffffff7dbcd3b4 ---- called from signal handler with signal 6 (SIGABRT) ------ [5] __lwp_kill(0x0, 0x6, 0xffffffff7f60512c, 0xffffffff7db48a04, 0x5, 0xffffffff7dcf2e78), at 0xffffffff7dbce4b0 [6] raise(0x6, 0x6, 0x5, 0xffffffff7dcef1c0, 0xffffffff7d205400, 0x5), at 0xffffffff7db6b25c [7] abort(0xffffffff7df11e98, 0xffffffff7df0cc38, 0xffffffff7de06a10, 0x19b6c0, 0x2, 0x0), at 0xffffffff7db48a04 [8] __Cimpl::default_terminate(0xffffffff7df11e98, 0xffffffff7df0cc38, 0xffffffff7df11e78, 0x10604c, 0x104e40, 0xffffffff7de064d0), at 0xffffffff7de064d4 [9] __Cimpl::ex_terminate(0xffffffff7df0de40, 0x0, 0x0, 0xffffffff7df0de40, 0xffffffff7df0c978, 0x1), at 0xffffffff7de062b4 [10] _ex_throw_body(0xffffffff7df0de40, 0x0, 0x0, 0x105af4, 0x3c, 0x100f741a8), at 0xffffffff7de06f04 [11] __Crun::ex_throw(0xffffffff7df0dec8, 0x1010450d0, 0x100776318, 0xffffffff7df0de40, 0xb8, 0xffffffff7df0de40), at 0xffffffff7de06e58 [12] SerialLog::startRecord(this = 0x101723298), line 709 in "SerialLog.cpp" [13] SerialLogRecord::startRecord(this = 0x1017280a8), line 224 in "SerialLogRecord.cpp" [14] SRLSavepointRollback::append(this = 0x1017280a8, transactionId = 23U, savepointId = 4353646), line 31 in "SRLSavepointRollback.cpp" [15] Transaction::rollbackSavepoint(this = 0x3365db570, savePointId = 4353646), line 1233 in "Transaction.cpp" [16] StorageDatabase::savepointRollback(this = 0x1017eb318, connection = 0x101815dd8, savePoint = 4353646), line 519 in "StorageDatabase.cpp" [17] StorageConnection::savepointRollback(this = 0x101815d40, savePoint = 4353646), line 213 in "StorageConnection.cpp" [18] StorageConnection::rollbackVerb(this = 0x101815d40), line 346 in "StorageConnection.cpp" [19] StorageInterface::rollback(hton = 0x1022186b0, thd = 0x338e29040, all = false), line 1254 in "ha_falcon.cpp" [20] ha_rollback_trans(thd = 0x338e29040, all = false), line 1241 in "handler.cc" [21] trans_rollback_stmt(thd = 0x338e29040), line 214 in "transaction.cc" [22] dispatch_command(command = COM_QUERY, thd = 0x338e29040, packet = 0x3392f76e1 "INSERT INTO TESTDATA VALUES(4353646, 'Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was hereOlav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was hereOlav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was hereOlav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was hereOlav was here Olav was here Olav was here Olav was here Olav was here Olav was here Ol" ..., packet_length = 1010U), line 1367 in "sql_parse.cc" [23] do_command(thd = 0x338e29040), line 688 in "sql_parse.cc" [24] handle_one_connection(arg = 0x338e29040), line 1153 in "sql_connect.cc"
[30 Sep 2008 9:02]
Olav Sandstå
Initial analysis/guessing on what happens: 1. During insert of data the serial log is written. When the size of the serial log file hits 4 GB an "error condition" (will investigate this further) occurs when calling (in SerialLog.cpp): flushWindow->write(flushBlock); and we set the status of the SerialLog object to: writeError = true; 2. This information is likely returned to the server which decides that we should rollback to the latest savepoint (see call stack). 3. When Falcon does the rollback to savepoint it tries to allocate a SRLSavepointRollback on the SerialLog. This fails due to the status of the SerialLog object is "writeError == true", see the following code in SerialLog::startRecord(): if (writeError) throw SQLError(IO_ERROR, "Previous I/O error on serial log prevents further processing"); 4. The excpection is never caught anywhere so the server crashes. Will update with information about what causes the status of the SerialLog object to be set to state "writeError == true" when hitting 4 GB file size. Need to added tracing of this to the code that do the writing of the serial log file.
[1 Oct 2008 13:47]
Olav Sandstå
The initial cause for the problem writing to the serial log (and for the serial log object getting status "writeError == true") is the following code in the start of SerialLogWindow::write: uint32 length = ROUNDUP(block->length, sectorSize); uint32 offset = (int) (origin + ((UCHAR*) block - buffer)); The offset variable (which is the offset we should write to in the serial log file) wraps around when reaching 4 GB. Changing the type to a 64 bit variable should hopefully fix this. Based on this we can also rule out that this was a "Sun Studio and Solaris" only problem. We would have hit this bug on any system where the serial log file grows past 4 GB.
[2 Oct 2008 9:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/55023 2845 Olav Sandstaa 2008-10-02 Fix for Bug#39575 Falcon crashes during loading of large dataset Changed datatype for the offset within the serial log file from 32 bit to 64 bit in order to handle that the serial log file can grow beyond 4 GB.
[2 Oct 2008 9:11]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/55027 2845 Olav Sandstaa 2008-10-02 Fix for Bug#39575 Falcon crashes during loading of large dataset Changed datatype for the offset within the serial log file from 32 bit to 64 bit in order to handle that the serial log file can grow beyond 4 GB.
[2 Oct 2008 9:23]
Olav Sandstå
With the patch my test client (attached as file to this bug) is able to run much further than without the patch. When I finally killed it the serial log file had reached 12 GB.
[9 Jan 2009 14:05]
MC Brown
A note has been added to the 6.0.8 changelog: When loading very large datasets into a Falcon table, MySQL may crash because the size of the Falcon serial log exceeds 4GB. The maximum supported size of the serial log file has been increased from a 32-bit to a 64-bit integer to handle larger log file sizes.