Bug #39912 Falcon can crash after hitting problems with the serial log
Submitted: 7 Oct 2008 16:08 Modified: 8 Jan 18:21
Reporter: Olav Sandstaa
Status: Closed
Category:Server: Falcon Severity:S3 (Non-critical)
Version:6.0.7 OS:Any
Assigned to: Olav Sandstaa Target Version:6.0.8
Tags: F_ERROR HANDLING
Triage: Triaged: D1 (Critical)

[7 Oct 2008 16:08] Olav Sandstaa
Description:
In bug report #39575 Falcon crashed after the serial log had hit a state where it could
not continue to write to the serial log file. This problem was fixed by fixing the code
that caused the serial log state to become invalid (see the fix to bug 39575). 

Still, the next time Falcon ends up in a situation where the serial log ends up in a
state where it can not write to the serial log file the same crash as shown in bug 39575
may happen. To avoid this we should improve the code with the following changes:

  1. Return a more severe error message to the server for the operation that caused the
serial log to become "unwritable". The proposed error code is HA_ERR_LOGGING_IMPOSSIBLE.

  2. Catch the exception that caused Falcon to crash (see call stack in bug 39575) and if
possible return an error to the server.

  3. Improve logging/tracing so that the error log file contains information about what
caused the serial log to become "invalid".

How to repeat:
Not repatable after fixing bug 39575 but it can be reproduced by reverting the change for
bug 39575. See details in bug report 39575,

Suggested fix:
Avoid that Falcon crashes the server when this situation occurs. See description.
[7 Oct 2008 18:14] Olav Sandstaa
Setting this to verified although the only way I know if to provoke the failure is to do
changes to the source code in order to get the Serial Log object to set the state to
writeError (but I have no doubt that sooner or later someone will run into this problem
if we do not fix it :-) ).
[14 Oct 2008 15:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56178

2863 Olav Sandstaa	2008-10-14
      Partial fix for Bug#39912 Falcon can crash after hitting problems with the serial
log
      
      Implements the following improvements when we have problems that prevent Falcon
from writing to the
      serial log:
      -introduces new exception: IO_ERROR_SERIALLOG to make it possible to distinguish IO
errors on the 
      serial log file from IO errors on the datafile
      -writes an error message to the error log file when the serial log becomes
impossible to write to
      -returns HA_ERR_LOGGING_IMPOSSIBLE to the server when insert/update operations
fails due to the 
      serial log is in state writeError
[14 Oct 2008 16:13] Olav Sandstaa
The attached patch implements item (1) and (3) of the proposed solution to this bug (see
initial bug report). The handling of uncaught exceptions will be committed in a follow-up
patch.
[16 Oct 2008 10:36] Olav Sandstaa
The first patch has been committed into the mysql-6.0-team-branch. Solution to the issues
stated as problem 2 in the initial bug report will be submitted in the next patch to
follow.
[16 Oct 2008 12:19] Olav Sandstaa
Call stack for the crash that occurs when the server issues a "rollback to savepoint"
after Falcon has set the serial log to state "writeError":

  [1] __lwp_kill(0x15, 0x6, 0xfffffe8b8fef3360, 0x0, 0x0, 0x0), at 0xfffffd7fff11318a
  [2] _thr_kill(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10e1b3
=>[3] my_write_core(sig = 6), line 307 in "stacktrace.c"
  [4] handle_segfault(sig = 6), line 2671 in "mysqld.cc"
  [5] __sighndlr(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff1100b6
  [6] call_user_handler(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff1053a2
  [7] sigacthandler(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff105588
  ---- called from signal handler with signal 6 (SIGABRT) ------
  [8] __lwp_kill(0x15, 0x6, 0xfffffe8b8fef3360, 0x5, 0xfffffd7fff172220, 0x0), at
0xfffffd7fff11318a
  [9] _thr_kill(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10e1b3
  [10] raise(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff0bcd79
  [11] abort(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff09c91e
  [12] __Cimpl::default_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff158599
  [13] std::terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff158042
  [14] __Cimpl::ex_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff15843e
  ---- hidden frames, use 'where -h' to see them all ----
  [16] SerialLog::startRecord(this = 0x1c72478), line 709 in "SerialLog.cpp"
  [17] SerialLogRecord::startRecord(this = 0x1c77288), line 224 in "SerialLogRecord.cpp"
  [18] SRLSavepointRollback::append(this = 0x1c77288, transactionId = 66U, savepointId =
15697), line 31 in "SRLSavepointRollback.cpp"
  [19] Transaction::rollbackSavepoint(this = 0x1c96ad8, savePointId = 15697), line 1223
in "Transaction.cpp"
  [20] StorageDatabase::savepointRollback(this = 0x1ce9d08, connection = 0x1d26348,
savePoint = 15697), line 519 in "StorageDatabase.cpp"
  [21] StorageConnection::savepointRollback(this = 0x1d262b0, savePoint = 15697), line
213 in "StorageConnection.cpp"
  [22] StorageConnection::rollbackVerb(this = 0x1d262b0), line 346 in
"StorageConnection.cpp"
  [23] StorageInterface::rollback(hton = 0x2018580, thd = 0x508cc30, all = false), line
1254 in "ha_falcon.cpp"
  [24] ha_rollback_trans(thd = 0x508cc30, all = false), line 1241 in "handler.cc"
  [25] trans_rollback_stmt(thd = 0x508cc30), line 233 in "transaction.cc"
  [26] dispatch_command(command = COM_QUERY, thd = 0x508cc30, packet = 0x508f5f1 "INSERT
INTO TESTDATA VALUES(15697, 'Olav was here Olav was here Olav was here Olav was here Olav
was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was
here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here
Olav was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav
was here Olav was here Olav was here Olav was here Olav was here Olav was here Olav was
here Olav was here Olav was here Olav was here Olav was here " ..., packet_length =
1032U), line 1368 in "sql_parse.cc"
  [27] do_command(thd = 0x508cc30), line 689 in "sql_parse.cc"
  [28] handle_one_connection(arg = 0x508cc30), line 1153 in "sql_connect.cc"
  [29] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10fd7b
  [30] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10ffb0

This crash occurs due to an uncaught exception being thrown.
[16 Oct 2008 12:59] Olav Sandstaa
Call stack for the crash that occurs when the server issues a "rollback of entire
transaction" after Falcon has set the serial log to state "writeError":

  [14] __Cimpl::ex_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff15843e
  ---- hidden frames, use 'where -h' to see them all ----
  [16] SerialLog::startRecord(this = 0x1c72928), line 709 in "SerialLog.cpp"
  [17] SerialLogRecord::startRecord(this = 0x1c771e8), line 224 in "SerialLogRecord.cpp"
  [18] SRLDelete::append(this = 0x1c771e8, dbb = 0x1cef740, transaction = 0x1c96f88,
sectionId = 0, recordId = 15695), line 44 in "SRLDelete.cpp"
  [19] Dbb::logRecord(this = 0x1cef740, sectionId = 0, recordId = 15695, stream = (nil),
transaction = 0x1c96f88), line 327 in "Dbb.cpp"
  [20] Table::deleteRecord(this = 0x1cb3f88, record = 0x56924b8, transaction =
0x1c96f88), line 2886 in "Table.cpp"
  [21] Table::rollbackRecord(this = 0x1cb3f88, recordToRollback = 0x56924b8, transaction
= 0x1c96f88), line 1015 in "Table.cpp"
  [22] RecordVersion::rollback(this = 0x56924b8, transaction = 0x1c96f88), line 162 in
"RecordVersion.cpp"
  [23] Transaction::rollback(this = 0x1c96f88), line 423 in "Transaction.cpp"
  [24] Connection::rollback(this = 0x1d27918), line 282 in "Connection.cpp"
  [25] StorageConnection::rollback(this = 0x1d27880), line 177 in
"StorageConnection.cpp"
  [26] StorageInterface::rollback(hton = 0x2018a30, thd = 0x508d0e0, all = true), line
1248 in "ha_falcon.cpp"
  [27] ha_rollback_trans(thd = 0x508d0e0, all = true), line 1241 in "handler.cc"  [28]
trans_rollback(thd = 0x508d0e0), line 174 in "transaction.cc"
  [29] THD::cleanup(this = 0x508d0e0), line 853 in "sql_class.cc"
  [30] unlink_thd(thd = 0x508d0e0), line 1957 in "mysqld.cc"
  [31] one_thread_per_connection_end(thd = 0x508d0e0, put_in_cache = true), line 2044 in
"mysqld.cc"
  [32] handle_one_connection(arg = 0x508d0e0), line 1160 in "sql_connect.cc"
  [33] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10fd7b
  [34] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10ffb0

This crash too is caused by an uncaught exception.

Fix: catch the exception and return an error code to the server.
[16 Oct 2008 14:00] Olav Sandstaa
After the serial log has changed status to "writeError" the Scavanger thread "dies" due to
the following exception beeing thrown and not caught until in Thread::thread():

  [14] __Cimpl::ex_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff15843e
  ---- hidden frames, use 'where -h' to see them all ----
  [16] SerialLog::startRecord(this = 0x1c72678), line 709 in "SerialLog.cpp"
  [17] SerialLogRecord::startRecord(this = 0x1c77220), line 224 in "SerialLogRecord.cpp"
  [18] SRLUpdateRecords::append(this = 0x1c77220, transaction = 0x1c96cd8, records =
0x50d77e8, chillRecords = false), line 133 in "SRLUpdateRecords.cpp"
  [19] Dbb::logUpdatedRecords(this = 0x1cea790, transaction = 0x1c96cd8, records =
0x50d77e8, chill = false), line 1244 in "Dbb.cpp"
  [20] Transaction::commit(this = 0x1c96cd8), line 266 in "Transaction.cpp"
  [21] Connection::commit(this = 0x1ceba20), line 272 in "Connection.cpp"
  [22] Database::commitSystemTransaction(this = 0x1aeb778), line 1256 in "Database.cpp"
  [23] Database::updateCardinalities(this = 0x1aeb778), line 2411 in "Database.cpp"
  [24] Database::scavenge(this = 0x1aeb778), line 1714 in "Database.cpp"
  [25] Scavenger::scavenge(this = 0x1cf4508), line 58 in "Scavenger.cpp"
  [26] Scavenger::execute(this = 0x1cf4508, scheduler = 0x1cf4430), line 68 in
"Scavenger.cpp"
  [27] Scheduler::schedule(this = 0x1cf4430), line 137 in "Scheduler.cpp"
  [28] Scheduler::schedule(lpParameter = 0x1cf4430), line 159 in "Scheduler.cpp"  [29]
Thread::thread(this = 0x1d206b0), line 167 in "Thread.cpp"
  [30] Thread::thread(parameter = 0x1d206b0), line 146 in "Thread.cpp"
  [31] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10fd7b
  [32] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10ffb0

Possible fixes: either (a) except that it is OK to let the scavenger dies since Falcon
anyway is in a state of "doomed" and need to be restarted or (b) handle the exception and
let the scavanger keep on working.
[16 Oct 2008 14:46] Olav Sandstaa
Call stack for an uncaught exception in the ioThread if the status of the serial log is
set to "writeError":

  [14] __Cimpl::ex_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff15843e
  ---- hidden frames, use 'where -h' to see them all ----
  [16] SerialLog::startRecord(this = 0x1c72758), line 709 in "SerialLog.cpp"
  [17] SerialLogRecord::startRecord(this = 0x1c76fb8), line 224 in "SerialLogRecord.cpp"
  [18] SRLCheckpoint::append(this = 0x1c76fb8, blockNumber = 74LL), line 43 in
"SRLCheckpoint.cpp"
  [19] SerialLog::pageCacheFlushed(this = 0x1c72758, blockNumber = 74LL), line 969 in
"SerialLog.cpp"
  [20] Database::pageCacheFlushed(this = 0x1aeb858, flushArg = 74LL), line 2517 in
"Database.cpp"
  [21] Cache::ioThread(this = 0x1ceb0b0), line 937 in "Cache.cpp"
  [22] Cache::ioThread(arg = 0x1ceb0b0), line 766 in "Cache.cpp"
  [23] Thread::thread(this = 0x1ceb4e0), line 167 in "Thread.cpp"
  [24] Thread::thread(parameter = 0x1ceb4e0), line 146 in "Thread.cpp"
  [25] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10fd7b
  [26] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10ffb0

Possible "fixes": (a) let the ioThread die since Falcon anyway needs a restart or (b)
handle the uncaught exception somewhere. Aiming at alternative b.
[16 Oct 2008 15:26] Olav Sandstaa
Call stack due to uncaught exceptions after running the query "SELECT * from olav2"
against Falcon when the Serial log is in state "writeError":

  [14] __Cimpl::ex_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff15843e
  ---- hidden frames, use 'where -h' to see them all ----
  [16] SerialLog::startRecord(this = 0x1c72808), line 709 in "SerialLog.cpp"
  [17] SerialLogRecord::startRecord(this = 0x1c76f00), line 224 in "SerialLogRecord.cpp"
  [18] SRLCommit::append(this = 0x1c76f00, transaction = 0x1c94e48), line 47 in
"SRLCommit.cpp"
  [19] Dbb::commit(this = 0x1cea920, transaction = 0x1c94e48), line 1175 in "Dbb.cpp"
  [20] Database::commit(this = 0x1aeb908, transaction = 0x1c94e48), line 2230 in
"Database.cpp"
  [21] Transaction::commit(this = 0x1c94e48), line 319 in "Transaction.cpp"
  [22] Connection::commit(this = 0x1d26228), line 272 in "Connection.cpp"
  [23] StorageConnection::commit(this = 0x1d26190), line 146 in "StorageConnection.cpp"
  [24] StorageConnection::endImplicitTransaction(this = 0x1d26190), line 338 in
"StorageConnection.cpp"
  [25] StorageInterface::external_lock(this = 0x5097438, thd = 0x509dfa0, lock_type = 3),
line 1936 in "ha_falcon.cpp"
  [26] handler::ha_external_lock(this = 0x5097438, thd = 0x509dfa0, lock_type = 3), line
5185 in "handler.cc"
  [27] unlock_external(thd = 0x509dfa0, table = 0x1eea898, count = 1U), line 800 in
"lock.cc"
  [28] mysql_unlock_read_tables(thd = 0x509dfa0, sql_lock = 0x1eea870), line 481 in
"lock.cc"
  [29] JOIN::join_free(this = 0x56a8b28), line 8685 in "sql_select.cc"
  [30] do_select(join = 0x56a8b28, fields = 0x509ff80, table = (nil), procedure = (nil)),
line 13485 in "sql_select.cc"
  [31] JOIN::exec(this = 0x56a8b28), line 2827 in "sql_select.cc"
  [32] mysql_select(thd = 0x509dfa0, rref_pointer_array = 0x50a0060, tables = 0x5095430,
wild_num = 1U, fields = CLASS, conds = (nil), og_num = 0, order = (nil), group = (nil),
having = (nil), proc_param = (nil), select_options = 2147764736ULL, result = 0x5095a80,
unit = 0x509fa18, select_lex = 0x509fe78), line 3017 in "sql_select.cc"
  [33] handle_select(thd = 0x509dfa0, lex = 0x509f978, result = 0x5095a80,
setup_tables_done_option = 0), line 288 in "sql_select.cc"
  [34] execute_sqlcom_select(thd = 0x509dfa0, all_tables = 0x5095430), line 4609 in
"sql_parse.cc"
  [35] mysql_execute_command(thd = 0x509dfa0), line 2035 in "sql_parse.cc"
  [36] mysql_parse(thd = 0x509dfa0, inBuf = 0x5095018 "select * from olav2", length =
19U, found_semicolon = 0xfffffd7ffc92fc18), line 5579 in "sql_parse.cc"
  [37] dispatch_command(command = COM_QUERY, thd = 0x509dfa0, packet = 0x508cfc1 "",
packet_length = 19U), line 1002 in "sql_parse.cc"
  [38] do_command(thd = 0x509dfa0), line 689 in "sql_parse.cc"
  [39] handle_one_connection(arg = 0x509dfa0), line 1153 in "sql_connect.cc"
  [40] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10fd7b
  [41] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10ffb0

Fix: catch the exception somewhere.

Question/to find out: Why do a simple select query end up with writing to the serial
log?
[21 Oct 2008 12:22] Olav Sandstaa
Call stack due to uncaught exceptions after running a "Commit" against Falcon when the
Serial log is in state "writeError":

  [13] std::terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff158042
  [14] __Cimpl::ex_terminate(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff15843e
  ---- hidden frames, use 'where -h' to see them all ----
  [16] SerialLog::startRecord(this = 0x1c72858), line 711 in "SerialLog.cpp"
  [17] SerialLogRecord::startRecord(this = 0x1c77400), line 224 in "SerialLogRecord.cpp"
  [18] SRLUpdateRecords::append(this = 0x1c77400, transaction = 0x1c98380, records =
0x3a3cf28, chillRecords = false), line 133 in "SRLUpdateRecords.cpp"
  [19] Dbb::logUpdatedRecords(this = 0x1cea968, transaction = 0x1c98380, records =
0x3a3cf28, chill = false), line 1244 in "Dbb.cpp"
  [20] Transaction::commit(this = 0x1c98380), line 266 in "Transaction.cpp"
  [21] Connection::commit(this = 0x1d278a8), line 272 in "Connection.cpp"
  [22] StorageConnection::commit(this = 0x1d27810), line 146 in "StorageConnection.cpp"
  [23] StorageInterface::commit(hton = 0x2018960, thd = 0x508d6c0, all = true), line 1205
in "ha_falcon.cpp"
  [24] ha_commit_one_phase(thd = 0x508d6c0, all = true), line 1174 in "handler.cc"
  [25] ha_commit_trans(thd = 0x508d6c0, all = true), line 1143 in "handler.cc"
  [26] trans_commit(thd = 0x508d6c0), line 100 in "transaction.cc"
  [27] mysql_execute_command(thd = 0x508d6c0), line 3882 in "sql_parse.cc"
  [28] mysql_parse(thd = 0x508d6c0, inBuf = 0x509e6b8 "commit", length = 6U,
found_semicolon = 0xfffffd7ffc92fc18), line 5579 in "sql_parse.cc"
  [29] dispatch_command(command = COM_QUERY, thd = 0x508d6c0, packet = 0x5090081
"commit", packet_length = 6U), line 1002 in "sql_parse.cc"
  [30] do_command(thd = 0x508d6c0), line 689 in "sql_parse.cc"
  [31] handle_one_connection(arg = 0x508d6c0), line 1153 in "sql_connect.cc"
  [32] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10fd7b
  [33] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff10ffb0

Proposed fix: handle the uncaught exception in StorageConnection::commit and return a
HA_ERR_LOGGING_IMPOSSIBLE error code to the server.
[21 Oct 2008 21:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56745

2875 Olav Sandstaa	2008-10-21
      Bug#39912 Falcon can crash after hitting problems with the serial log
      
      Implements handling of previously uncaught exceptions thrown by the
      serial log after it has gotten into a "writeError" state. The
      following code has been extended to handle exceptions:
      
       -rollback to savepoint
       -rollback of transactions
       -commit
       -commit of implicite transactions
       -Scavenger updating cardinalities
       -IO-threads writing the checkpoint record
[23 Oct 2008 9:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56864

2877 Olav Sandstaa	2008-10-23
      Bug#39912 Falcon can crash after hitting problems with the serial log
            
      Implements handling of previously uncaught exceptions thrown by the
      serial log after it has gotten into a "writeError" state. The
      following code has been extended to handle exceptions:
            
      -rollback to savepoint
      -rollback of transactions
      -commit
      -commit of implicite transactions
      -Scavenger updating cardinalities
      -IO-threads writing the checkpoint record
[28 Oct 2008 9:10] Bugs System
Pushed into 6.0.8-alpha  (revid:olav@sun.com-20081023075838-8ue2tkfdezmegzp7) (version
source revid:cpowers@mysql.com-20081024001220-6k4wp6uzqppbtiys) (pib:5)
[8 Jan 18:21] MC Brown
A note has been added to the 6.0.8 changelog: 

When the Falcon serial log reaches a state where the serial log can no longer be written
to, for example when the disk is full, or when permissions have been changed on an open
log, then MySQL could crash.