MySQL Bugs: #26296: falcon tablespace becomes corrupted after "mysqladmin shutdown"

Bug #26296	falcon tablespace becomes corrupted after "mysqladmin shutdown"
Submitted:	12 Feb 2007 21:50	Modified:	2 Jul 2007 4:53
Reporter:	Shane Bester (Platinum Quality Contributor)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S2 (Serious)
Version:	5.2.3-falcon-alpha-debug	OS:	Linux (suse9.3 x86)
Assigned to:		CPU Architecture:	Any
Tags:	crash, tablespace corruption

Description:
During a simple test, mysql got killed by OOM killer.

After mysqld_safe restarts mysqld, InnoDB recovers ok, but access to falcon tables causes a crash with the below written in the error log:

Version: '5.2.3-falcon-alpha-debug'  socket: '/tmp/mysql.sock'  port: 3306  yes
database open failed: can't open file "/home/sbester/server/5.2/mysql-5.2.3-falcon-alpha-linux-i686/data/test.fts": No such file or directory (2)
/home/sbester/server/5.2/mysql-5.2.3-falcon-alpha-linux-i686/bin/mysqld: Out of memory at line 421, 'sql_list.h'
/home/sbester/server/5.2/mysql-5.2.3-falcon-alpha-linux-i686/bin/mysqld: needed 5732 byte (6k), memory in use: 28192975 bytes (27533k)

Number of processes running now: 0
070212 23:28:34  mysqld restarted
070212 23:28:34  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
070212 23:28:35  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 0 46409.
InnoDB: Doing recovery: scanned up to log sequence number 0 46409
070212 23:28:35  InnoDB: Started; log sequence number 0 46409
070212 23:28:35 [Note] /home/sbester/server/5.2/mysql-5.2.3-falcon-alpha-linux-i686/bin/mysqld: ready for connections.
Version: '5.2.3-falcon-alpha-debug'  socket: '/tmp/mysql.sock'  port: 3306  yes
mysqld got signal 4;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388572
read_buffer_size=131072
max_used_connections=3
max_connections=151
threads_connected=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 336762 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x8e19dc8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x42b17e0c, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x81f5637 handle_segfault + 417
0xffffe410 _end + -142426112
0x83a9308 _ZN5Error10debugBreakEv + 18
0x83a927f _ZN5Error5errorEPKcz + 87
0x83a92ee _ZN5Error15assertionFailedEPKci + 32
0x84040f2 _ZN3Bdb4markEj + 70
0x83b9b12 _ZN13IndexRootPage10indexMergeEP3DbbiP14SRLUpdateIndexj + 1346
0x83f4d0c _ZN14SRLUpdateIndex7executeEv + 154
0x83f4be3 _ZN14SRLUpdateIndex4redoEv + 17
0x83e1db3 _ZN9SerialLog7recoverEv + 1635
0x839c9f8 _ZN8Database12openDatabaseEPKc + 710
0x8398a84 _ZN10Connection11getDatabaseEPKcS1_P7Threads + 256
0x8396ff5 _ZN10Connection12openDatabaseEPKcS1_S1_S1_S1_P7Threads + 237
0x8376add _ZN15StorageDatabase17getOpenConnectionEv + 109
0x8375ed1 _ZN17StorageConnection7connectEv + 21
0x83785a1 _ZN14StorageHandler20getStorageConnectionEPKcP3THD10OpenOption + 375
0x83707bf _ZN15NfsStorageTable4openEPKcij + 207
0x82d6793 _ZN7handler7ha_openEP8st_tablePKcii + 261
0x8246d6f _Z21open_table_from_shareP3THDP14st_table_sharePKcjjjP8st_tableb + 1567
0x823cf60 _Z17open_unireg_entryP3THDP8st_tableP13st_table_listPKcPcjP11st_mem_rootj + 1502
0x823b5e3 _Z10open_tableP3THDP13st_table_listP11st_mem_rootPbj + 2397
0x823d33b _Z11open_tablesP3THDPP13st_table_listPjj + 475
0x823db61 _Z30open_normal_and_derived_tablesP3THDP13st_table_listj + 177
0x82fb924 _Z14get_all_tablesP3THDP13st_table_listP4Item + 2156
0x8303c84 _Z24get_schema_tables_resultP4JOIN + 394
0x824f8a0 _ZN4JOIN4execEv + 1598
0x825127f _Z12mysql_selectP3THDPPP4ItemP13st_table_listjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_sel + 637
0x824c4f9 _Z13handle_selectP3THDP6st_lexP13select_resultm + 329
0x8217115 _Z21execute_sqlcom_selectP3THDP13st_table_list + 759
0x820f9fe _Z21mysql_execute_commandP3THD + 1480
0x8218a91 _Z11mysql_parseP3THDPcj + 353
0x820df51 _Z16dispatch_command19enum_server_commandP3THDPcj + 2077
0x820d729 _Z10do_commandP3THD + 603
0x820c816 handle_one_connection + 900
0x40250aa7 _end + 933750423
0x401e6c2e _end + 933316638
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/using-stack-trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do 
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x8e1b460 = show table status
thd->thread_id=2
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.

How to repeat:
3000 tables created randomly (3000.txt)

my tester - 15 threads running with parameters:
--host=192.168.250.2:3306 --queryfile=falcon.query --threads=15 --duration=1800 --shuffle-queries=1 --user=root --database=test --generate-queryfile=0

after a crash (due to some memory leak - another bug), run SHOW TABLE STATUS, or SHOW CREATE TABLE on a falcon table.

Suggested fix:
recovery cleanly

simply have many inserts/update to a table, and then issue "mysqladmin shutdown" so that the server has a clean shutdown will cause a problem.

when you start the server again, SHOW TABLE STATUS will crash it.

run the testcase in but #26298 for a few seconds (to cause db traffic), then control-c it, and issue "mysqladmin shutdown"

then, run mysqld_safe again, and quickly run a SHOW TABLE STATUS. will crash immediately. 

a clean shutdown of the database causes entire falcon tablespace to be corrupted to the extent you have to rm -rf the datadir..

I could reproduce this as described. I used the test program with following parameters 

--queryfile=falcon.query --threads=15 --duration=1800 --shuffle-queries=1 -generate-queryfile=0

I did a mysqladmin shutdown after > 30 seconds then restarted mysqld and issued a SHOW TABLE STATUS.

We hit following assertion:
(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x40182541 in raise () from /lib/tls/libc.so.6
#2  0x08457822 in Error::debugBreak () at Error.cpp:93
#3  0x0845787f in Error::error (
    string=0x8764598 "assertion failed at line %d in file %s\n")
    at Error.cpp:70
#4  0x0845790f in Error::assertionFailed (fileName=0x876ed59 "BDB.cpp",
    line=72) at Error.cpp:77
#5  0x084d238e in Bdb::mark (this=0x417901d0, transId=0) at BDB.cpp:72
#6  0x0846b21d in IndexRootPage::indexMerge (dbb=0x404fab3c, indexId=45,
    logRecord=0x4175a8f4, transId=0) at IndexRootPage.cpp:921
#7  0x084bd0b6 in SRLUpdateIndex::execute (this=0x4175a8f4)
    at SRLUpdateIndex.cpp:155
#8  0x084bd199 in SRLUpdateIndex::redo (this=0x4175a8f4)
    at SRLUpdateIndex.cpp:127
#9  0x084a4338 in SerialLog::recover (this=0x41795614) at SerialLog.cpp:354
#10 0x0844a3de in Database::openDatabase (this=0x4175e028,
    filename=0x4175ab80 "/data/mysql-sap/test.fts") at Database.cpp:603
#11 0x0843ef55 in Connection::getDatabase (this=0x404fa9dc,
    dbName=0x404fa7dc "test",
    dbFileName=0x4175ab80 "/data/mysql-sap/test.fts", threads=0x404fa820)
    at Connection.cpp:1643

Regards, Hakan

With latest Falcon we are hitting different assertion:

[Switching to Thread 1082665312 (LWP 16916)]
0x00002b07c45fd2ac in raise () from /lib/libpthread.so.0
(gdb) bt
#0  0x00002b07c45fd2ac in raise () from /lib/libpthread.so.0
#1  0x00000000007c31f2 in Error::debugBreak () at Error.cpp:93
#2  0x00000000007c32ed in Error::error (string=<value optimized out>) at Error.cpp:70
#3  0x00000000007c334c in Error::assertionFailed (fileName=0x4 <Address 0x4 out of bounds>, line=16916)
    at Error.cpp:77
#4  0x00000000007d1183 in IndexRootPage::findInsertionLeaf (dbb=0x2aaaab0337b8, indexId=40,
    indexKey=0x4087e0c0, recordNumber=2663, transId=<value optimized out>) at IndexRootPage.cpp:305
#5  0x00000000007d24f6 in IndexRootPage::indexMerge (dbb=0x2aaaab0337b8, indexId=40, logRecord=0x4087f1a8,
    transId=0) at IndexRootPage.cpp:872
#6  0x000000000080de19 in SRLUpdateIndex::execute (this=0x4087f1a8) at SRLUpdateIndex.cpp:162
#7  0x000000000080de37 in SRLUpdateIndex::redo (this=0x420e) at SRLUpdateIndex.cpp:133
#8  0x00000000007fd289 in SerialLog::recover (this=0x2aaaaafa3928) at SerialLog.cpp:321
#9  0x00000000007b718b in Database::openDatabase (this=0x2aaaaaf372e0,
    filename=0x4087f3f0 "/data/mysql-sap/test.fts") at Database.cpp:607
#10 0x00000000007afd7d in Connection::getDatabase (this=0x2aaaab007e50, dbName=0x2aaaab03288c "TEST",
    dbFileName=0x4087f3f0 "/data/mysql-sap/test.fts", threads=0x2aaaab00a1b8) at Connection.cpp:1645
#11 0x00000000007b0a41 in Connection::openDatabase (this=0x2aaaab007e50, dbName=0x2aaaab03288c "TEST",
    filename=<value optimized out>, account=0x9d6cfc "mysql", password=0x9d6cfc "mysql", address=0x0,
    parent=0x2aaaab00a1b8) at Connection.cpp:925
#12 0x0000000000791233 in StorageDatabase::getOpenConnection (this=0x2aaaab033030)
    at StorageDatabase.cpp:158
#13 0x000000000078fa47 in StorageConnection::connect (this=0x2aaaab02fd18) at StorageConnection.cpp:138
#14 0x0000000000793772 in StorageHandler::getStorageConnection (this=0x2aaaaae00048,
    tableShare=0x2aaaab0011c8, mySqlThread=0xe4e088, mySqlThdId=1, createFlag=OpenDatabase)

I don't have any idea how to reproduce this.  Are the sources of the
test program available somewhere?

Ran test case

./testcase_26298 --queryfile=falcon.query --threads=15 --duration=1800 --shuffle-queries=1 --generate-queryfile=0 

issued shutdown after 30 seconds and restarted mysqld. However mysqladmin -uroot  shutdown did not shutdown properly due to Bug#29133. Therefore I had to kill mysqld.

--> No crash anymore.

Best regards,

Hakan

A note has been added to the 6.0.1 changelog: 

Under certain situations, shutting down MySQL using mysqladmin could cause Falcon to corrupt the database tables and fail to restart properly. (Bug#26296)