Bug #33498 | Falcon causes server crash, hang, duplicate primary keys when disk full | ||
---|---|---|---|
Submitted: | 24 Dec 2007 8:52 | Modified: | 27 Apr 2009 16:23 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
Version: | 6.0.4 | OS: | Any |
Assigned to: | Philip Stoev | CPU Architecture: | Any |
Tags: | F_ISOLATION, simpler_testcase_needed |
[24 Dec 2007 8:52]
Philip Stoev
[24 Dec 2007 11:43]
Philip Stoev
Here is a sample workbench for exposing bugs of this class: Step 1. Start a mysql server: $ perl mysql-test-run.pl --mysqld=--max-connections=30 --mysqld=--falcon-lock-timeout=10000 --skip-ndb --vardir=/build/sysbench_vardir --start-and-exit $ mysqladmin create sbtest Step 2. Run sysbench preparation $ ./sysbench --mysql-socket=/build/sysbench_vardir/tmp/master.sock --test=oltp --mysql-user=root --oltp-table-size=100000 --mysql-table-engine=Falcon --mysql-engine-trx=yes prepare Step 3. Reduce disk space on partition so that about 50 mb are left. $ dd if=/dev/zero of=/build/bigfileX bs=1024 count=1048576 Step 4. Run sysbench proper $ ./sysbench --mysql-socket=/build/sysbench_vardir/tmp/master.sock --test=oltp --mysql-engine-trx=yes --mysql-user=root --max-requests=0 --max-time=28800 --num-threads=10 --oltp-table-size=100000 --mysql-table-engine=Falcon run Then observe as the query.log eats all available space on the partition. Once the partition is full, there will be a Falcon crash, transactions hanging, etc. Sysbench may report corrupted replies, connection losses, etc. Step 5a. If server is still up, free disk space and restart sysbench if necessary, for another round of tests. Step 5b. If server has crashed, restart manually in the same vardir, and attempt to access the sbtest database.
[24 Dec 2007 11:44]
Philip Stoev
Another core produced with the method above: 0x824a171 handle_segfault + 441 0x8422914 _ZN13RecordVersion12fetchVersionEP11Transaction + 8 0x8422950 _ZN13RecordVersion12fetchVersionEP11Transaction + 68 0x8422950 _ZN13RecordVersion12fetchVersionEP11Transaction + 68 0x83bb994 _ZN15StorageDatabase11nextIndexedEP12StorageTablePvib + 244 0x83bfe78 _ZN12StorageTable11nextIndexedEib + 36 0x83b320f _ZN16StorageInterface10index_nextEPh + 127 0x830df4e _ZN7handler15read_range_nextEv + 74 0x830ce85 _ZN7handler21multi_range_read_nextEPPc + 189 0x82fa235 _ZN18QUICK_RANGE_SELECT8get_nextEv + 53 0x83063f2 _Z8rr_quickP14st_read_record + 26 0x829d137 _Z10sub_selectP4JOINP13st_join_tableb + 179 0x82a8434 _Z9do_selectP4JOINP4ListI4ItemEP8st_tableP9Procedure + 280 0x829636a _ZN4JOIN4execEv + 4566 0x8296d83 _Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select + 339 0x829209b _Z13handle_selectP3THDP6st_lexP13select_resultm + 307 0x825df6d _Z21execute_sqlcom_selectP3THDP10TABLE_LIST + 149 0x825542b _Z21mysql_execute_commandP3THD + 747 0x82bf01d _ZN18Prepared_statement7executeEP6Stringb + 673 0x82bdac9 _Z18mysql_stmt_executeP3THDPcj + 285 0x8254cbe _Z16dispatch_command19enum_server_commandP3THDPcj + 3990 0x8253cf8 _Z10do_commandP3THD + 176 0x8252547 handle_one_connection + 287
[24 Dec 2007 22:58]
Philip Stoev
After another server restart, the huge tablespace from the first comment is now readable, so the issues that remain are the crashes and the fact that the 100000 sysbench records ended up taking 1.6 gigabytes. I kindly request that you attempt to run the smaller scenario described in my comment from [24 Dec 12:43]. I will also be running it repeatedly to get a more clear understanding of what is going on.
[28 Dec 2007 20:53]
Philip Stoev
After the application of several cycles of the procedure descibed above, I was able to arrive at this tablespace shell.mysql.com:/supportftp/pub/mysql/upload/bug33498.zip It exibits the following characteristics: * running select id , count(*) as C from sbtest.sbtest group by id having C > 1 returns 10 records with duplicate values even though id is declared primary key auto_increment. * startup (meaning first access to table) is slow or hangs when starting with this tablespace; * shutdown hangs (meaning mysqladmin shutdown).
[11 Jan 2008 6:10]
Christopher Powers
I ran the short scenario many, many times, and exhausted the disk space in each test, but did not encounter the crash. The Falcon pagewriter thread recognized the out-of-space condition, and simply kept trying to write until space was available. I recommend fixing this post-beta.
[10 Feb 2008 4:28]
Kevin Lewis
There was another bug, 33824m that also resulted in duplicate primary keys. But it only happened with a binary produced by an old compiler, GCC version: 3.2.3. Is it possible that this bug also only occurred with a binary fom a GCC 3.2.3 compiler?
[2 May 2008 4:09]
Kevin Lewis
This problem of duplicate primary keys was fixed by Bug#35322 - Falcon duplicate primary keys. This fix made Index::insert and Table::checkUniqueIndex atomic.
[22 Aug 2008 9:18]
Philip Stoev
A framework that repeatedly exhausts disk space will need to be created to verify that Falcon operates correctly.
[27 Apr 2009 16:23]
Philip Stoev
Those issues have not been seen since, even though falcon runs out of disk space on a regular basis as part of other tests. So, all sympthoms described here appear to have been resolved some time ago.