Bug #37276 | Maria: various index corruption, block format assertion failure | ||
---|---|---|---|
Submitted: | 9 Jun 2008 8:25 | Modified: | 10 Jul 2008 14:57 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Maria storage engine | Severity: | S2 (Serious) |
Version: | 5.1-maria | OS: | Linux |
Assigned to: | CPU Architecture: | Any |
[9 Jun 2008 8:25]
Shane Bester
[9 Jun 2008 8:41]
MySQL Verification Team
testcase, runs insert/update/delete/check table in 3 threads, crashes after seconds
Attachment: bug37276.c (text/plain), 8.55 KiB.
[9 Jun 2008 8:44]
MySQL Verification Team
testcase is attached. I'm using source code built from mysql-maria tree on mysql.bkbits.net: [sbester@box1 mysql-maria]$ bkf changes|head OK-root OK ChangeSet@1.2632.1.1, 2008-06-02 13:49:39+03:00, jani@hynda.mysql.fi +1 -0 A fix for adding tests that need slave being started. Now including a master-slave.inc in test, or subsequent include files is enough. Before the fix one had to hardcode the test name in the source file.
[25 Jun 2008 13:20]
Guilhem Bichot
using latest bzr mysql-maria tree, linux 32 bit. Just use Shane's file, ater some seconds: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1238131824 (LWP 9593)] 0x0863d816 in trnman_end_trn (trn=0x92bec38, commit=1 '\001') at trnman.c:410 410 t->commit_trid < active_list_min.next->min_read_from; Current language: auto; currently c (gdb) p t $1 = (TRN *) 0x0 (gdb) bt #0 0x0863d816 in trnman_end_trn (trn=0x92bec38, commit=1 '\001') at trnman.c:410 #1 0x0865a358 in ma_commit (trn=0x92bec38) at ma_commit.c:60 #2 0x085d7605 in ha_maria::external_lock (this=0xb5f060c0, thd=0x9299bb8, lock_type=2) at ha_maria.cc:2315 #3 0x083a3154 in handler::ha_external_lock (this=0xb5f060c0, thd=0x9299bb8, lock_type=2) at handler.cc:4483 #4 0x08280a71 in unlock_external (thd=0x9299bb8, table=0x9290d00, count=1) at lock.cc:786 #5 0x08280d44 in mysql_unlock_tables (thd=0x9299bb8, sql_lock=0x9290ce8) at lock.cc:389 #6 0x082e08ec in close_thread_tables (thd=0x9299bb8) at sql_base.cc:1391 #7 0x082a4ac4 in dispatch_command (command=COM_QUERY, thd=0x9299bb8, packet=0x929b591 "", packet_length=163) at sql_parse.cc:1504 #8 0x082a4e23 in do_command (thd=0x9299bb8) at sql_parse.cc:789
[25 Jun 2008 15:13]
Guilhem Bichot
first problem, specific of 32-bit linux, is that in for (t= committed_list_min.next, free_me_count= 0; t->commit_trid < active_list_min.next->min_read_from; t= t->next, free_me_count++) /* no-op */; we went too far in the list because (gdb) p active_list_min.next->min_read_from $7 = 18446744073709551615 (gdb) p committed_list_min.next->next->next->next->commit_trid $8 = 4294967295 and so came to a t==NULL. This 184... is wrong and is due to ma_blockrec.c: info->trn->min_read_from= ~(TrID) 0; which is inconsistent with trnman.c: active_list_max.min_read_from= ~(ulong) 0; committed_list_max.commit_trid= ~(ulong) 0; So a first patch is to use ~(ulong)0 in ma_blockrec.c (actually we should define a constant and use it in place of ~(ulong)0 in trnman.c and ma_blockrec.c, or define a macro in trnman*.h to set to ~(ulong)0. After that, the test runs for 9 minutes and then: mysqld: ma_blockrec.c:1394: _ma_compact_block_page: Assertion `row_length >= min_row_length || row_length == 0' failed. #5 0x08600d61 in _ma_compact_block_page (buff=0xb65f3008 "\001", block_size=8192, rownr=16, extend_block=1 '\001', min_read_from=28745, min_row_length=35) at ma_blockrec.c:1394 #6 0x08601823 in get_head_or_tail_page (info=0xb6206778, block=0xb620b540, buff=0xb6206b78 "", length=229, page_type=1, lock=PAGECACHE_LOCK_WRITE, res=0xb61fdf80) at ma_blockrec.c:1694 #7 0x086054a2 in allocate_and_write_block_record (info=0xb6206778, record=0xb62017a0 "\235\030", row=0xb62067bc, undo_lsn=1) at ma_blockrec.c:3371 #8 0x08605787 in _ma_write_init_block_record (info=0xb6206778, record=0xb62017a0 "\235\030") at ma_blockrec.c:3429 #9 0x0861416c in maria_write (info=0xb6206778, record=0xb62017a0 "\235\030") at ma_write.c:154 #10 0x085d9806 in ha_maria::write_row (this=0xb6201668, buf=0xb62017a0 "\235\030") at ha_maria.cc:967 #11 0x083a4a39 in handler::ha_write_row (this=0xb6201668, buf=0xb62017a0 "\235\030") at handler.cc:4520 #12 0x08321edd in write_record (thd=0x9299b08, table=0xb6200558, info=0xb61fe2fc) at sql_insert.cc:1561 #13 0x08326a4f in mysql_insert (thd=0x9299b08, table_list=0xb5f00978, fields=@0x929b148, values_list=@0x929b178, update_fields=@0x929b168, update_values=@0x929b158, duplic=DUP_ERROR, ignore=false) at sql_insert.cc:816 #14 0x0829cf6c in mysql_execute_command (thd=0x9299b08) at sql_parse.cc:2947 #15 0x082a31d9 in mysql_parse (thd=0x9299b08, inBuf=0xb5f006f8 "insert into t2(a,b) values ('vcmraubststyahynmsclfwebwsyhkfosuysdfaqqnqiuthfqpubkipqrdwlqzrjfaqmrhyooraenmyjarxybjligwrgmvjcfoplmxhkaqbkyiozfbeedpypkrhtosswdzbwwrylhpzpkfsqwbvofahdmoizmeolyygfy','sicz"..., length=253, found_semicolon=0xb61ff27c) at sql_parse.cc:5643
[25 Jun 2008 15:17]
Guilhem Bichot
At time of assertion: (gdb) p row_length $9 = 10
[25 Jun 2008 20:54]
Guilhem Bichot
from Serg: "I vaguely remember that Monty noticed it too, we've discussed it, I agreed that it should be ~(TrID)0, and Monty said he'll fix." so looks like it should be ~(TrID)0 all around, instead of ~(ulong)0.
[29 Jun 2008 12:07]
Michael Widenius
Note: This may be fixed in Maria tree after my last push as I have already changed trnman.c to use ~(TrID)0. Haven't had time to verify this.
[3 Jul 2008 13:13]
Guilhem Bichot
Errors vary. Sometimes it is DBUG_ASSERT(pageno < ((ULL(1)) << 40)); in pagecache_read() (next_page is a huge number in w_search()), other times it is "Incorrect key file for table 't2'; try to repair it". Happens on Linux 32 bit and 64 bit, with 5.1-maria and 6.0-maria, after one minute at most. Does not depend on transactional=0|1, happens with block and dynamic format (though I once saw an assertion in block format, see one previous post), with and without --debug.
[3 Jul 2008 14:08]
Guilhem Bichot
Ok, here is a minimal single-threaded testcase, which gives ERROR 1034 (HY000) at line 4062: Incorrect key file for table 't2'; try to repair it on its last query. Go into 5.1-maria and: ./mtr --mem --start-and-exit maria ../client/mysql -u root -S var/tmp/master.sock < /tmp/bug37236_1.sql where /tmp/bug37236_1.sql is the attached file (unzipped).
[3 Jul 2008 14:09]
Guilhem Bichot
sorry, bug37276_1.sql
[3 Jul 2008 14:10]
Guilhem Bichot
zip of SQL file to trigger index corruption in table t2, bug#37276
Attachment: bug37276_1.sql.zip (application/zip, text), 331.88 KiB.
[3 Jul 2008 15:09]
Guilhem Bichot
If I insert "CHECK TABLE t2 EXTENDED" before the INSERTs at line 3961 and 3962 of bug37276_1.sql, the first CHECK says table is ok, but the second says: "test.t2 check error Invalid key block position: 15731098820608 key block size: 8192 file_length: 425984" and the INSERT right after that fails with "Incorrect key file for table 't2'; try to repair it". Compared to the script without CHECK (bug37276_1.sql), the failing INSERT is not the last but a few lines before (i.e. CHECK TABLE seems to make INSERTs fail a bit earlier). The bug isn't due to checkpoints, I ran with --mysqld=--maria-checkpoint-interval=0 .
[3 Jul 2008 15:12]
Guilhem Bichot
Fixer: don't forget to re-run the full gypsy original C-code testcase before closing bug, as I observed various errors not seen with the SQL testcase.
[3 Jul 2008 19:48]
Guilhem Bichot
I'm attaching t2.zip; it is a non-corrupted table generated by: - starting mysqld with the above mtr command line - running bug37276_1.sql up to line 3960 excluded, i.e. this INSERT: insert into t2(a,b) values ('khtlbejbtbtllow etc excluded. - shutting down mysqld - doing maria_chk -e on t2, no error. So this table can now be used to study the corruption creation: - start mysqld - copy t2 into database directory - running the INSERT above, it succeeds, and then CHECK TABLE reports corruption.
[3 Jul 2008 20:22]
Guilhem Bichot
sane table which can be corrupted with a single INSERT
Attachment: bug37276_t2.tar.bz2 (application/x-bzip2, text), 421.71 KiB.
[3 Jul 2008 20:25]
Guilhem Bichot
test: Splitting nod seems to be a DBUG mark of the operation which creates the corrupted page (the page with a crazy page number somewhere inside one of its keys). Node split.
[4 Jul 2008 8:38]
Guilhem Bichot
Simple way to reproduce: download bug37276_t2.tar.bz2, unpack in /tmp, and run this test with mtr: copy_file /tmp/t2.frm $MYSQLTEST_VARDIR/master-data/test/t2.frm; copy_file /tmp/t2.MAD $MYSQLTEST_VARDIR/master-data/test/t2.MAD; copy_file /tmp/t2.MAI $MYSQLTEST_VARDIR/master-data/test/t2.MAI; #check table t2 extended; insert into t2(a,b) values ('khtlbejbtbtllowhmyhjyuyrioqcobsdzehdydiosshlsxmcazvggwvfdbsljoslaqbdmqunazcsmlvslxzkzwgihyfqmurpvhnxfdnqkzbegwboqymsxnfcespljoutxnfsomfamunmxklsklxxugokmhfoeetuxaqvrdmtpkpivjygmnpevzxodicjlwnloxizngjkwmfs','oa') ; check table t2 extended; The CHECK TABLE above will print Invalid key block position: 15731098820608 key block size: 8192 file_length: 425984
[4 Jul 2008 18:51]
MySQL Verification Team
corrupts todays bzr maria-5.1: Page length and length of keys don't match at page: 24576
Attachment: bug37276_reduced_corruption.sql (application/unknown, text), 7.03 KiB.
[7 Jul 2008 8:57]
Guilhem Bichot
After Monty's fix " revision-id: monty@mysql.com-20080705110321-tmb1fqw0fldh5b5q parent: guilhem@mysql.com-20080630093458-o2e0jt4qphvdd76m committer: Michael Widenius <monty@mysql.com> branch nick: mysql-maria timestamp: Sat 2008-07-05 14:03:21 +0300 message: Bug#37276 maria crash on insert around the time check table is run Fixed several (but not all) issues found by the test program: - ASSERT on row_length in ma_blockrec.c::_ma_compact_block_page() - Fixed bug when splitting node pages - Fixed hang in 'closeing tables' (conflicting mutex order) by ensuring we fir st take trnman lock and then share->intern_lock " bug37276_1.sql and bug37276_2.sql don't fail anymore. The C file still fails after 6 minutes: query failed 'update t1 set a='umafolwxjtzgqdciqhpgwnkfweshenjqlankgroxyuhzlxoqcvitfgsjayqnbewntxxciwsgzedg',b='ymrndhxrulgeqnrrcyuakgghpwkuztpraeecmfrwqfhmdkgihxpbbrjdnpdkkonbcfhvqhrqihujpjokhdpsamvzcsekdcrrcltzavfrubvlttmwjhetokkqgwrzdtlikhbt' where id%10=0' : 1034 (Incorrect key file for table 't1'; try to repair it) And bug37276_reduced_corruption.sql still fails too.
[7 Jul 2008 13:27]
Guilhem Bichot
Note that with the C file, the corruption does not always happen at the same time; the last I got is: 011373583 query failed 'insert into t1(a,b) values ('tucsnjxyfpezfzimaivpmxzwtlokbghhmmpvyhoceexynbnhzjwsggqymnnryfgbhvapfrenstwqxwpqxoutgtsjtjfouhn','kwusdhezljawmcuebktqzhqiajjecwenepfoezwqjyzkhedjzghoteddluhswbdsm')' : 1034 (Incorrect key file for table 't1'; try to repair it)
[7 Jul 2008 13:54]
Guilhem Bichot
queries: 011373583 is what I meant.
[10 Jul 2008 12:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/49427 2675 Sergei Golubchik 2008-07-10 bug#37276: Maria: various index corruption, block format assertion failure chk_index() is called recursively and need to use a local buffer for a key
[10 Jul 2008 14:57]
Guilhem Bichot
All corruptions reported here (C file, SQL files) are now fixed in mysql-maria and mysql-6.0-maria. Thanks Monty and Serg. Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html
[24 Jul 2008 8:26]
Bugs System
Pushed into 6.0.7-alpha (revid:kpettersson@mysql.com-20080724082624-59z1hc8un6mxdb46) (pib:2)
[13 Sep 2008 20:42]
Bugs System
Pushed into 6.0.7-alpha (revid:monty@mysql.com-20080705110321-tmb1fqw0fldh5b5q) (version source revid:hakan@mysql.com-20080725175322-8wgujj5xuzrjz3ke) (pib:3)