Bug #39912 Falcon can crash after hitting problems with the serial log
Submitted: 7 Oct 2008 14:08 Modified: 8 Jan 2009 17:21
Reporter: Olav Sandstå Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0.7 OS:Any
Assigned to: Olav Sandstå CPU Architecture:Any
Tags: F_ERROR HANDLING
Triage: Triaged: D1 (Critical)

[7 Oct 2008 14:08] Olav Sandstå
Description:
In bug report #39575 Falcon crashed after the serial log had hit a state where it could not continue to write to the serial log file. This problem was fixed by fixing the code that caused the serial log state to become invalid (see the fix to bug 39575). 

Still, the next time Falcon ends up in a situation where the serial log ends up in a state where it can not write to the serial log file the same crash as shown in bug 39575 may happen. To avoid this we should improve the code with the following changes:

  1. Return a more severe error message to the server for the operation that caused the serial log to become "unwritable". The proposed error code is HA_ERR_LOGGING_IMPOSSIBLE.

  2. Catch the exception that caused Falcon to crash (see call stack in bug 39575) and if possible return an error to the server.

  3. Improve logging/tracing so that the error log file contains information about what caused the serial log to become "invalid".

How to repeat:
Not repatable after fixing bug 39575 but it can be reproduced by reverting the change for bug 39575. See details in bug report 39575,

Suggested fix:
Avoid that Falcon crashes the server when this situation occurs. See description.
[7 Oct 2008 16:14] Olav Sandstå
Setting this to verified although the only way I know if to provoke the failure is to do changes to the source code in order to get the Serial Log object to set the state to writeError (but I have no doubt that sooner or later someone will run into this problem if we do not fix it :-) ).
[14 Oct 2008 13:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56178

2863 Olav Sandstaa	2008-10-14
      Partial fix for Bug#39912 Falcon can crash after hitting problems with the serial log
      
      Implements the following improvements when we have problems that prevent Falcon from writing to the
      serial log:
      -introduces new exception: IO_ERROR_SERIALLOG to make it possible to distinguish IO errors on the 
      serial log file from IO errors on the datafile
      -writes an error message to the error log file when the serial log becomes impossible to write to
      -returns HA_ERR_LOGGING_IMPOSSIBLE to the server when insert/update operations fails due to the 
      serial log is in state writeError
[14 Oct 2008 14:13] Olav Sandstå
The attached patch implements item (1) and (3) of the proposed solution to this bug (see initial bug report). The handling of uncaught exceptions will be committed in a follow-up patch.
[16 Oct 2008 8:36] Olav Sandstå
The first patch has been committed into the mysql-6.0-team-branch. Solution to the issues stated as problem 2 in the initial bug report will be submitted in the next patch to follow.
[16 Oct 2008 10:19] Olav Sandstå
Call stack for the crash that occurs when the server issues a "rollback to savepoint" after Falcon has set the serial log to state "writeError":

  [1] __lwp_kill(0x15, 0x6, 0xfffffe8b8fef3360, 0x0, 0x0, 0x0), at 0xfffffd7fff11318a
  [2] _thr_kill(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10e1b3
=>[3] my_write_core(sig = 6), line 307 in "stacktrace.c"
  [4] handle_segfault(sig = 6), line 2671 in "mysqld.cc"
  [5] __sighndlr(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff1100b6
  [6] call_user_handler(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff1053a2
  [7] sigacthandler(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff105588
  ---- called from signal handler with signal 6 (SIGABRT) ------
  [8] __lwp_kill(0x15, 0x6, 0xfffffe8b8fef3360, 0x5, 0xfffffd7fff172220, 0x0), at 0xfffffd7fff11318a
  [9] _thr_kill(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10e1b3
  [10] raise(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff0bcd79
  [11] abort(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff09c91e
  [12] __Cimpl::default_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff158599
  [13] std::terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff158042
  [14] __Cimpl::ex_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff15843e
  ---- hidden frames, use 'where -h' to see them all ----
  [16] SerialLog::startRecord(this = 0x1c72478), line 709 in "SerialLog.cpp"
  [17] SerialLogRecord::startRecord(this = 0x1c77288), line 224 in "SerialLogRecord.cpp"
  [18] SRLSavepointRollback::append(this = 0x1c77288, transactionId = 66U, savepointId = 15697), line 31 in "SRLSavepointRollback.cpp"
  [19] Transaction::rollbackSavepoint(this = 0x1c96ad8, savePointId = 15697), line 1223 in "Transaction.cpp"
  [20] StorageDatabase::savepointRollback(this = 0x1ce9d08, connection = 0x1d26348, savePoint = 15697), line 519 in "StorageDatabase.cpp"
  [21] StorageConnection::savepointRollback(this = 0x1d262b0, savePoint = 15697), line 213 in "StorageConnection.cpp"
  [22] StorageConnection::rollbackVerb(this = 0x1d262b0), line 346 in "StorageConnection.cpp"
  [23] StorageInterface::rollback(hton = 0x2018580, thd = 0x508cc30, all = false), line 1254 in "ha_falcon.cpp"
  [24] ha_rollback_trans(thd = 0x508cc30, all = false), line 1241 in "handler.cc"
  [25] trans_rollback_stmt(thd = 0x508cc30), line 233 in "transaction.cc"
  [26] dispatch_command(command = COM_QUERY, thd = 0x508cc30, packet = 0x508f5f1 "INSERT INTO TESTDATA VALUES(15697, 'Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here " ..., packet_length = 1032U), line 1368 in "sql_parse.cc"
  [27] do_command(thd = 0x508cc30), line 689 in "sql_parse.cc"
  [28] handle_one_connection(arg = 0x508cc30), line 1153 in "sql_connect.cc"
  [29] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10fd7b
  [30] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10ffb0

This crash occurs due to an uncaught exception being thrown.
[16 Oct 2008 10:59] Olav Sandstå
Call stack for the crash that occurs when the server issues a "rollback of entire transaction" after Falcon has set the serial log to state "writeError":

  [14] __Cimpl::ex_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff15843e
  ---- hidden frames, use 'where -h' to see them all ----
  [16] SerialLog::startRecord(this = 0x1c72928), line 709 in "SerialLog.cpp"
  [17] SerialLogRecord::startRecord(this = 0x1c771e8), line 224 in "SerialLogRecord.cpp"
  [18] SRLDelete::append(this = 0x1c771e8, dbb = 0x1cef740, transaction = 0x1c96f88, sectionId = 0, recordId = 15695), line 44 in "SRLDelete.cpp"
  [19] Dbb::logRecord(this = 0x1cef740, sectionId = 0, recordId = 15695, stream = (nil), transaction = 0x1c96f88), line 327 in "Dbb.cpp"
  [20] Table::deleteRecord(this = 0x1cb3f88, record = 0x56924b8, transaction = 0x1c96f88), line 2886 in "Table.cpp"
  [21] Table::rollbackRecord(this = 0x1cb3f88, recordToRollback = 0x56924b8, transaction = 0x1c96f88), line 1015 in "Table.cpp"
  [22] RecordVersion::rollback(this = 0x56924b8, transaction = 0x1c96f88), line 162 in "RecordVersion.cpp"
  [23] Transaction::rollback(this = 0x1c96f88), line 423 in "Transaction.cpp"
  [24] Connection::rollback(this = 0x1d27918), line 282 in "Connection.cpp"
  [25] StorageConnection::rollback(this = 0x1d27880), line 177 in "StorageConnection.cpp"
  [26] StorageInterface::rollback(hton = 0x2018a30, thd = 0x508d0e0, all = true), line 1248 in "ha_falcon.cpp"
  [27] ha_rollback_trans(thd = 0x508d0e0, all = true), line 1241 in "handler.cc"  [28] trans_rollback(thd = 0x508d0e0), line 174 in "transaction.cc"
  [29] THD::cleanup(this = 0x508d0e0), line 853 in "sql_class.cc"
  [30] unlink_thd(thd = 0x508d0e0), line 1957 in "mysqld.cc"
  [31] one_thread_per_connection_end(thd = 0x508d0e0, put_in_cache = true), line 2044 in "mysqld.cc"
  [32] handle_one_connection(arg = 0x508d0e0), line 1160 in "sql_connect.cc"
  [33] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10fd7b
  [34] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10ffb0

This crash too is caused by an uncaught exception.

Fix: catch the exception and return an error code to the server.
[16 Oct 2008 12:00] Olav Sandstå
After the serial log has changed status to "writeError" the Scavanger thread "dies" due to the following exception beeing thrown and not caught until in Thread::thread():

  [14] __Cimpl::ex_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff15843e
  ---- hidden frames, use 'where -h' to see them all ----
  [16] SerialLog::startRecord(this = 0x1c72678), line 709 in "SerialLog.cpp"
  [17] SerialLogRecord::startRecord(this = 0x1c77220), line 224 in "SerialLogRecord.cpp"
  [18] SRLUpdateRecords::append(this = 0x1c77220, transaction = 0x1c96cd8, records = 0x50d77e8, chillRecords = false), line 133 in "SRLUpdateRecords.cpp"
  [19] Dbb::logUpdatedRecords(this = 0x1cea790, transaction = 0x1c96cd8, records = 0x50d77e8, chill = false), line 1244 in "Dbb.cpp"
  [20] Transaction::commit(this = 0x1c96cd8), line 266 in "Transaction.cpp"
  [21] Connection::commit(this = 0x1ceba20), line 272 in "Connection.cpp"
  [22] Database::commitSystemTransaction(this = 0x1aeb778), line 1256 in "Database.cpp"
  [23] Database::updateCardinalities(this = 0x1aeb778), line 2411 in "Database.cpp"
  [24] Database::scavenge(this = 0x1aeb778), line 1714 in "Database.cpp"
  [25] Scavenger::scavenge(this = 0x1cf4508), line 58 in "Scavenger.cpp"
  [26] Scavenger::execute(this = 0x1cf4508, scheduler = 0x1cf4430), line 68 in "Scavenger.cpp"
  [27] Scheduler::schedule(this = 0x1cf4430), line 137 in "Scheduler.cpp"
  [28] Scheduler::schedule(lpParameter = 0x1cf4430), line 159 in "Scheduler.cpp"  [29] Thread::thread(this = 0x1d206b0), line 167 in "Thread.cpp"
  [30] Thread::thread(parameter = 0x1d206b0), line 146 in "Thread.cpp"
  [31] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10fd7b
  [32] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10ffb0

Possible fixes: either (a) except that it is OK to let the scavenger dies since Falcon anyway is in a state of "doomed" and need to be restarted or (b) handle the exception and let the scavanger keep on working.
[16 Oct 2008 12:46] Olav Sandstå
Call stack for an uncaught exception in the ioThread if the status of the serial log is set to "writeError":

  [14] __Cimpl::ex_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff15843e
  ---- hidden frames, use 'where -h' to see them all ----
  [16] SerialLog::startRecord(this = 0x1c72758), line 709 in "SerialLog.cpp"
  [17] SerialLogRecord::startRecord(this = 0x1c76fb8), line 224 in "SerialLogRecord.cpp"
  [18] SRLCheckpoint::append(this = 0x1c76fb8, blockNumber = 74LL), line 43 in "SRLCheckpoint.cpp"
  [19] SerialLog::pageCacheFlushed(this = 0x1c72758, blockNumber = 74LL), line 969 in "SerialLog.cpp"
  [20] Database::pageCacheFlushed(this = 0x1aeb858, flushArg = 74LL), line 2517 in "Database.cpp"
  [21] Cache::ioThread(this = 0x1ceb0b0), line 937 in "Cache.cpp"
  [22] Cache::ioThread(arg = 0x1ceb0b0), line 766 in "Cache.cpp"
  [23] Thread::thread(this = 0x1ceb4e0), line 167 in "Thread.cpp"
  [24] Thread::thread(parameter = 0x1ceb4e0), line 146 in "Thread.cpp"
  [25] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10fd7b
  [26] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10ffb0

Possible "fixes": (a) let the ioThread die since Falcon anyway needs a restart or (b) handle the uncaught exception somewhere. Aiming at alternative b.
[16 Oct 2008 13:26] Olav Sandstå
Call stack due to uncaught exceptions after running the query "SELECT * from olav2" against Falcon when the Serial log is in state "writeError":

  [14] __Cimpl::ex_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff15843e
  ---- hidden frames, use 'where -h' to see them all ----
  [16] SerialLog::startRecord(this = 0x1c72808), line 709 in "SerialLog.cpp"
  [17] SerialLogRecord::startRecord(this = 0x1c76f00), line 224 in "SerialLogRecord.cpp"
  [18] SRLCommit::append(this = 0x1c76f00, transaction = 0x1c94e48), line 47 in "SRLCommit.cpp"
  [19] Dbb::commit(this = 0x1cea920, transaction = 0x1c94e48), line 1175 in "Dbb.cpp"
  [20] Database::commit(this = 0x1aeb908, transaction = 0x1c94e48), line 2230 in "Database.cpp"
  [21] Transaction::commit(this = 0x1c94e48), line 319 in "Transaction.cpp"
  [22] Connection::commit(this = 0x1d26228), line 272 in "Connection.cpp"
  [23] StorageConnection::commit(this = 0x1d26190), line 146 in "StorageConnection.cpp"
  [24] StorageConnection::endImplicitTransaction(this = 0x1d26190), line 338 in "StorageConnection.cpp"
  [25] StorageInterface::external_lock(this = 0x5097438, thd = 0x509dfa0, lock_type = 3), line 1936 in "ha_falcon.cpp"
  [26] handler::ha_external_lock(this = 0x5097438, thd = 0x509dfa0, lock_type = 3), line 5185 in "handler.cc"
  [27] unlock_external(thd = 0x509dfa0, table = 0x1eea898, count = 1U), line 800 in "lock.cc"
  [28] mysql_unlock_read_tables(thd = 0x509dfa0, sql_lock = 0x1eea870), line 481 in "lock.cc"
  [29] JOIN::join_free(this = 0x56a8b28), line 8685 in "sql_select.cc"
  [30] do_select(join = 0x56a8b28, fields = 0x509ff80, table = (nil), procedure = (nil)), line 13485 in "sql_select.cc"
  [31] JOIN::exec(this = 0x56a8b28), line 2827 in "sql_select.cc"
  [32] mysql_select(thd = 0x509dfa0, rref_pointer_array = 0x50a0060, tables = 0x5095430, wild_num = 1U, fields = CLASS, conds = (nil), og_num = 0, order = (nil), group = (nil), having = (nil), proc_param = (nil), select_options = 2147764736ULL, result = 0x5095a80, unit = 0x509fa18, select_lex = 0x509fe78), line 3017 in "sql_select.cc"
  [33] handle_select(thd = 0x509dfa0, lex = 0x509f978, result = 0x5095a80, setup_tables_done_option = 0), line 288 in "sql_select.cc"
  [34] execute_sqlcom_select(thd = 0x509dfa0, all_tables = 0x5095430), line 4609 in "sql_parse.cc"
  [35] mysql_execute_command(thd = 0x509dfa0), line 2035 in "sql_parse.cc"
  [36] mysql_parse(thd = 0x509dfa0, inBuf = 0x5095018 "select * from olav2", length = 19U, found_semicolon = 0xfffffd7ffc92fc18), line 5579 in "sql_parse.cc"
  [37] dispatch_command(command = COM_QUERY, thd = 0x509dfa0, packet = 0x508cfc1 "", packet_length = 19U), line 1002 in "sql_parse.cc"
  [38] do_command(thd = 0x509dfa0), line 689 in "sql_parse.cc"
  [39] handle_one_connection(arg = 0x509dfa0), line 1153 in "sql_connect.cc"
  [40] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10fd7b
  [41] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10ffb0

Fix: catch the exception somewhere.

Question/to find out: Why do a simple select query end up with writing to the serial log?
[21 Oct 2008 10:22] Olav Sandstå
Call stack due to uncaught exceptions after running a "Commit" against Falcon when the Serial log is in state "writeError":

  [13] std::terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff158042
  [14] __Cimpl::ex_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff15843e
  ---- hidden frames, use 'where -h' to see them all ----
  [16] SerialLog::startRecord(this = 0x1c72858), line 711 in "SerialLog.cpp"
  [17] SerialLogRecord::startRecord(this = 0x1c77400), line 224 in "SerialLogRecord.cpp"
  [18] SRLUpdateRecords::append(this = 0x1c77400, transaction = 0x1c98380, records = 0x3a3cf28, chillRecords = false), line 133 in "SRLUpdateRecords.cpp"
  [19] Dbb::logUpdatedRecords(this = 0x1cea968, transaction = 0x1c98380, records = 0x3a3cf28, chill = false), line 1244 in "Dbb.cpp"
  [20] Transaction::commit(this = 0x1c98380), line 266 in "Transaction.cpp"
  [21] Connection::commit(this = 0x1d278a8), line 272 in "Connection.cpp"
  [22] StorageConnection::commit(this = 0x1d27810), line 146 in "StorageConnection.cpp"
  [23] StorageInterface::commit(hton = 0x2018960, thd = 0x508d6c0, all = true), line 1205 in "ha_falcon.cpp"
  [24] ha_commit_one_phase(thd = 0x508d6c0, all = true), line 1174 in "handler.cc"
  [25] ha_commit_trans(thd = 0x508d6c0, all = true), line 1143 in "handler.cc"
  [26] trans_commit(thd = 0x508d6c0), line 100 in "transaction.cc"
  [27] mysql_execute_command(thd = 0x508d6c0), line 3882 in "sql_parse.cc"
  [28] mysql_parse(thd = 0x508d6c0, inBuf = 0x509e6b8 "commit", length = 6U, found_semicolon = 0xfffffd7ffc92fc18), line 5579 in "sql_parse.cc"
  [29] dispatch_command(command = COM_QUERY, thd = 0x508d6c0, packet = 0x5090081 "commit", packet_length = 6U), line 1002 in "sql_parse.cc"
  [30] do_command(thd = 0x508d6c0), line 689 in "sql_parse.cc"
  [31] handle_one_connection(arg = 0x508d6c0), line 1153 in "sql_connect.cc"
  [32] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10fd7b
  [33] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10ffb0

Proposed fix: handle the uncaught exception in StorageConnection::commit and return a HA_ERR_LOGGING_IMPOSSIBLE error code to the server.
[21 Oct 2008 19:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56745

2875 Olav Sandstaa	2008-10-21
      Bug#39912 Falcon can crash after hitting problems with the serial log
      
      Implements handling of previously uncaught exceptions thrown by the
      serial log after it has gotten into a "writeError" state. The
      following code has been extended to handle exceptions:
      
       -rollback to savepoint
       -rollback of transactions
       -commit
       -commit of implicite transactions
       -Scavenger updating cardinalities
       -IO-threads writing the checkpoint record
[23 Oct 2008 7:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56864

2877 Olav Sandstaa	2008-10-23
      Bug#39912 Falcon can crash after hitting problems with the serial log
            
      Implements handling of previously uncaught exceptions thrown by the
      serial log after it has gotten into a "writeError" state. The
      following code has been extended to handle exceptions:
            
      -rollback to savepoint
      -rollback of transactions
      -commit
      -commit of implicite transactions
      -Scavenger updating cardinalities
      -IO-threads writing the checkpoint record
[28 Oct 2008 8:10] Bugs System
Pushed into 6.0.8-alpha  (revid:olav@sun.com-20081023075838-8ue2tkfdezmegzp7) (version source revid:cpowers@mysql.com-20081024001220-6k4wp6uzqppbtiys) (pib:5)
[8 Jan 2009 17:21] MC Brown
A note has been added to the 6.0.8 changelog: 

When the Falcon serial log reaches a state where the serial log can no longer be written to, for example when the disk is full, or when permissions have been changed on an open log, then MySQL could crash.