Bug #33724 Falcon crash on recovery with huge log and tablespace
Submitted: 7 Jan 2008 16:10 Modified: 31 Jan 2008 16:32
Reporter: Philip Stoev Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0.4-p2 OS:Any
Assigned to: Jim Starkey CPU Architecture:Any

[7 Jan 2008 16:10] Philip Stoev
Description:
When running the iuds6 test, Falcon crashed with the following stack trace:

#0 0xffffe410 in __kernel_vsyscall () 
#1 0x0060ef57 in pthread_kill () from /lib/libpthread.so.0 
#2 0x08355acf in write_core (sig=10738) at stacktrace.c:240 
#3 0x0824a265 in handle_segfault () 
#4 
#5 0xffffe410 in __kernel_vsyscall () 
#6 0x003e4c00 in raise () from /lib/libc.so.6 
#7 0x003e6451 in abort () from /lib/libc.so.6 
#8 0xf7fc5277 in __cxa_call_unexpected () from /usr/lib/libstdc++.so.5 
#9 0xf7fc52c4 in std::terminate () from /usr/lib/libstdc++.so.5 
#10 0xf7fc549c in __cxa_rethrow () from /usr/lib/libstdc++.so.5 
#11 0x083cfab4 in Thread::thread () 
#12 0x083cf7e0 in Thread::thread () 
#13 0x0060a2db in start_thread () from /lib/libpthread.so.0 
#14 0x0048914e in clone () from /lib/libc.so.6 

This crash left a 2gig core file and the following sizes of tablespace files, even though the test is not supposed to create millions of records:

-rw-rw---- 1 qauser qauser   39840768 Jan  7 14:12 falcon_master.fl1
-rw-rw---- 1 qauser qauser 2169584128 Jan  7 16:17 falcon_master.fl2
-rw-rw---- 1 qauser qauser    1372160 Jan  7 17:00 falcon_master.fts
-rw-rw---- 1 qauser qauser    1093632 Jan  7 16:59 falcon_temporary.fts
-rw-rw---- 1 qauser qauser  568881152 Jan  7 17:00 falcon_user.fts

Running mysqld on that tablespace with --falcon-debug-mask=65535 resulted in following lines in STDERR:

followed by a crash with the following backtrace:

0x83fce71 Error::error(char const*, ...) + 73
0x8472ea5 Cache::fetchPage(Dbb*, int, PageType, LockType) + 441
0x83f5d22 Dbb::handoffPage(Bdb*, int, PageType, LockType) + 34
0x844a9f2 Section::getSectionPage(Dbb*, int, int, LockType, unsigned int) + 554
0x844ce28 Section::getSectionPage(int, LockType, unsigned int) + 184
0x844b6c1 Section::fetchLocatorPage(int, int, LockType, unsigned int) + 65
0x844b199 Section::reInsertStub(int, unsigned int) + 37
0x83f5f1d Dbb::reInsertStub(int, int, unsigned int) + 141
0x8446abf SRLUpdateRecords::redo() + 403
0x844fed2 SerialLog::recover() + 1230
0x83ec68b Database::openDatabase(char const*) + 367
0x83e8fc1 Connection::getDatabase(char const*, char const*, Threads*) + 193
0x83e68d6 Connection::openDatabase(char const*, char const*, char const*, char const*, char const*, Threads*) + 74
0x83bb3cb StorageDatabase::getOpenConnection() + 63
0x83bf170 StorageHandler::initialize() + 108
0x83bebda StorageHandler::getStorageConnection(StorageTableShare*, THD*, int, OpenOption) + 426
0x83b158a StorageInterface::open(char const*, int, unsigned int) + 98
0x830abfa handler::ha_open(st_table*, char const*, int, int) + 34
0x8312955 ha_partition::open(char const*, int, unsigned int) + 405
0x830abfa handler::ha_open(st_table*, char const*, int, int) + 34
0x828b39b open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, open_table_mode) + 783
0x8288600 open_unireg_entry(THD*, st_table*, TABLE_LIST*, char const*, char*, unsigned int, st_mem_root*, unsigned int) + 124
0x8283162 open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int) + 1954
0x82843cf open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int) + 1103
0x8284719 open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int) + 25
0x8330a5f get_all_tables(THD*, TABLE_LIST*, Item*) + 1367
0x83386fa get_schema_tables_result(JOIN*, enum_schema_table_state) + 370
0x8296b00 JOIN::exec() + 6492
0x8296d93 _Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select + 339
0x82920ab handle_select(THD*, st_lex*, select_result*, unsigned long) + 307
0x825df7d execute_sqlcom_select(THD*, TABLE_LIST*) + 149
0x825543b mysql_execute_command(THD*) + 747
0x825b5ad mysql_parse(THD*, char const*, unsigned int, char const**) + 269
0x82540c5 dispatch_command(enum_server_command, THD*, char*, unsigned int) + 909
0x8253d08 do_command(THD*) + 176
0x8252557 handle_one_connection + 287

How to repeat:
Try running iuds6 scenario of the systems test using an all-falcon iuds6.tst configuration file. More instructions are available at:

https://inside.mysql.com/wiki/SystemQATestPlan
[7 Jan 2008 16:15] Philip Stoev
STDERR is:

Recovering database /data1/6.0.4/systest_vardir-p3/master-data/falcon_master.fts ...
first recovery block is 282998
last recovery block is 1858544
recovery read block is 283917

followed by a numerous

RecordLocatorPage: transient corruption corrected

and finally:

Bugcheck: page 16814 wrong page type, expected 2 got 7

080107 17:00:26 - mysqld got signal 4;
[7 Jan 2008 18:51] Philip Stoev
Second recovery attempt crashed with no stack trace.

Third recovery attempt crashed with:

0x84832e9 SRLRecordLocator::pass2() + 101
0x844fe4a SerialLog::recover() + 1094
0x83ec68b Database::openDatabase(char const*) + 367
0x83e8fc1 Connection::getDatabase(char const*, char const*, Threads*) + 193
0x83e68d6 Connection::openDatabase(char const*, char const*, char const*, char const*, char const*, Threads*) + 74
0x83bb3cb StorageDatabase::getOpenConnection() + 63
0x83bf170 StorageHandler::initialize() + 108
0x83bebda StorageHandler::getStorageConnection(StorageTableShare*, THD*, int, OpenOption) + 426
0x83b158a StorageInterface::open(char const*, int, unsigned int) + 98
0x830abfa handler::ha_open(st_table*, char const*, int, int) + 34
0x8312955 ha_partition::open(char const*, int, unsigned int) + 405
0x830abfa handler::ha_open(st_table*, char const*, int, int) + 34
0x828b39b open_table_from_share(THD*, st_table_share*, char const*, unsigned int, unsigned int, unsigned int, st_table*, open_table_mode) + 783
0x8288600 open_unireg_entry(THD*, st_table*, TABLE_LIST*, char const*, char*, unsigned int, st_mem_root*, unsigned int) + 124
0x8283162 open_table(THD*, TABLE_LIST*, st_mem_root*, bool*, unsigned int) + 1954
0x82843cf open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int) + 1103
0x8284719 open_normal_and_derived_tables(THD*, TABLE_LIST*, unsigned int) + 25
0x8330a5f get_all_tables(THD*, TABLE_LIST*, Item*) + 1367
0x83386fa get_schema_tables_result(JOIN*, enum_schema_table_state) + 370
0x8296b00 JOIN::exec() + 6492
0x8296d93 _Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select + 339
0x82920ab handle_select(THD*, st_lex*, select_result*, unsigned long) + 307
0x825df7d execute_sqlcom_select(THD*, TABLE_LIST*) + 149
0x825543b mysql_execute_command(THD*) + 747
0x825b5ad mysql_parse(THD*, char const*, unsigned int, char const**) + 269
0x82540c5 dispatch_command(enum_server_command, THD*, char*, unsigned int) + 909
0x8253d08 do_command(THD*) + 176
0x8252557 handle_one_connection + 287
[17 Jan 2008 16:59] Philip Stoev
Setting this to verified, since the corrupted tablespace is available for this bug. Can you please triage accordingly?
[31 Jan 2008 16:32] Jim Starkey
This has the signature of the double recovery bug (better named "premature
shutdown" bug), but it's impossible to know for sure.

If the bug reappears, please re-open it.