Bug #20213 | DBT2 testing cause mysqld to core using Innodb | ||
---|---|---|---|
Submitted: | 1 Jun 2006 21:07 | Modified: | 18 Jun 2010 12:45 |
Reporter: | Jonathan Miller | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S1 (Critical) |
Version: | 5.1.12-bk, 5.1.11 | OS: | Linux (Linux 32 Bit OS) |
Assigned to: | Heikki Tuuri | CPU Architecture: | Any |
[1 Jun 2006 21:07]
Jonathan Miller
[1 Jun 2006 21:15]
Jonathan Miller
0x81eeacc handle_segfault + 438 0x4af420 (?) 0x8511eda hash_next + 72 0x82d3d4f _ZN11ha_innobase10index_readEPcPKcj16ha_rkey_function + 405 0x8254296 _Z20join_read_always_keyP13st_join_table + 330 0x8251b67 _Z10sub_selectP4JOINP13st_join_tableb + 285 0x8251e9c _Z9do_selectP4JOINP4ListI4ItemEP8st_tableP9Procedure + 668 0x8261ed5 _ZN4JOIN4execEv + 6235 0x8262151 _Z12mysql_selectP3THDPPP4ItemP13st_table_listjR4ListIS1_ES2_jP8st_orderSB_S2_SB_mP13select_resultP18st_select_lex_unitP13st_sel + 561 0x82623ac _Z13handle_selectP3THDP6st_lexP13select_resultm + 352 0x8207e4e _Z21mysql_execute_commandP3THD + 1880 0x820f767 _Z11mysql_parseP3THDPcj + 535 0x821004a _Z16dispatch_command19enum_server_commandP3THDPcj + 1996 0x821118c _Z10do_commandP3THD + 260 0x8211543 handle_one_connection + 725 0x640bd4 (?) 0x5984fe (?)
[2 Jun 2006 6:10]
Heikki Tuuri
InnoDB does not (yet) support engine_condition_pushdown = on. The question is why MySQL changes execution so that InnoDB crashes.
[2 Jun 2006 7:59]
Martin Skold
Engines that support condition pushdown have to overload handler method cond_push, otherwise it will have no effect: virtual const COND *cond_push(const COND *cond) { return cond; }; Must be something in the different execution path that triggers a bug that has always been there. The only difference with turning on the flag is that the condition is generated before passing it to cond_push: COND *push_cond= make_cond_for_table(tmp, current_map, current_map); We could add a capability flag for handlers to avoid doing this for handlers that don't support this feature, but still it shouldn't cause a crash. BR -- Martin
[2 Jun 2006 9:28]
Heikki Tuuri
Martin, but how can the code path change if the pushdown is a null operation? Regards, Heikki
[3 Jul 2006 21:47]
Mikael Ronström
The bug happens also when the flag engine_condition_pushdown = 0 so I don't think it has anything to do with push down conditions. It can happen on any table where you get always the same error that the Row id field is wrong length (large number == -1) in index PRIMARY of the table Since it crashes the server I would call this a showstopper as well since it does so without using any strange queries, just plain old SELECT, UPDATES, INSERTS and DELETE's.
[3 Jul 2006 22:39]
Mikael Ronström
I tried DBT2 in 5.0 and found to start an interesting performance bug. It took 4 minutes and 35 seconds to load 10 warehouses in 5.1 and 2 minutes and 28 seconds in 5.0
[23 Jul 2006 10:24]
Valeriy Kravchuk
Bug #20595 was marked as a duplicate of this one.
[28 Jul 2006 6:29]
Heikki Tuuri
Hi! /****************************************************************** Stores the row id to the prebuilt struct. */ static void row_sel_store_row_id_to_prebuilt( /*=============================*/ row_prebuilt_t* prebuilt, /* in: prebuilt */ rec_t* index_rec, /* in: record */ dict_index_t* index, /* in: index of the record */ const ulint* offsets) /* in: rec_get_offsets (index_rec, index) */ { byte* data; ulint len; ut_ad(rec_offs_validate(index_rec, index, offsets)); data = rec_get_nth_field(index_rec, offsets, dict_index_get_sys_col_pos(index, DATA_ROW_ID), &len); if (len != DATA_ROW_ID_LEN) { fprintf(stderr, "InnoDB: Error: Row id field is wrong length %lu in ", (ulong) len); dict_index_name_print(stderr, prebuilt->trx, index); fprintf(stderr, "\n" "InnoDB: Field number %lu, record:\n", (ulong) dict_index_get_sys_col_pos(index, DATA_ROW_ID)); rec_print_new(stderr, index_rec, offsets); putc('\n', stderr); ut_error; } ut_memcpy(prebuilt->row_id, data, len); } /************************************************************************ Returns the position of a system column in an index. */ UNIV_INLINE ulint dict_index_get_sys_col_pos( /*=======================*/ /* out: position, ULINT_UNDEFINED if not contained */ dict_index_t* index, /* in: index */ ulint type) /* in: DATA_ROW_ID, ... */ { dict_col_t* col; ut_ad(index); ut_ad(index->magic_n == DICT_INDEX_MAGIC_N); ut_ad(!(index->type & DICT_UNIVERSAL)); col = dict_table_get_sys_col(index->table, type); if (index->type & DICT_CLUSTERED) { return(col->clust_pos); } return(dict_index_get_nth_col_pos(index, dict_table_get_sys_col_no(index->table, type))); } The record in the hex dump looks like a sensible clustered index record. The fact that dict_index_get_nth_col_pos() returns ULINT_UNDEFINED suggests that in the function call, the struct 'index' is a secondary index! A possible explanation for this bug is that MySQL/InnoDB in a wrong way uses the SAME table handle to process two different queries at the same time. That could also explain other crashes of 5.1 under high concurrency stress tests, for example, http://bugs.mysql.com/bug.php?id=21322. Regards, Heikki
[28 Jul 2006 6:43]
Heikki Tuuri
All, can you run under gdb and check in a crash if MySQL is actually doing the index search in the primary index of the table? Also check all other threads in mysqld if they are using the SAME table handle in their respective queries. Using the same table handle at the same time for two different queries would explain this crash. ### An alternative explanation for the bug is that MySQL changes the index used in the table handle, but continues processing the query like the index would still be the same. It would be nice to know WHAT query MySQL is processing here. The stack traces did not show the query. Regards, Heikki #6 0x083ee318 in row_search_for_mysql () #7 0x082d3d4f in ha_innobase::index_read () #8 0x08254296 in join_read_always_key () #9 0x08251b67 in sub_select () #10 0x08251e9c in do_select () #11 0x08261ed5 in JOIN::exec () #12 0x08262151 in mysql_select () #13 0x082623ac in handle_select () #14 0x08207e4e in mysql_execute_command ()
[29 Jul 2006 6:03]
Heikki Tuuri
These bug reports suggest that some serious bug was introduced in 5.1.11: http://bugs.mysql.com/bug.php?id=20213 http://bugs.mysql.com/bug.php?id=20337 http://bugs.mysql.com/bug.php?id=20595 http://bugs.mysql.com/bug.php?id=21322
[29 Jul 2006 10:01]
Heikki Tuuri
Can you test if 5.1.9 exhibits the same crashes? Regards, Heikki
[31 Jul 2006 6:09]
Heikki Tuuri
Hi! Assigning this to Osku. Can you try to repeat this in your own computer or in the 4 x Opteron computer we have access to? This bug report suggests that the bug was introduced in 5.1.11: http://bugs.mysql.com/bug.php?id=20595 I will attach diff files for ha_innodb.cc and innobase between 5.1.8 and 5.1.12. There are quite a lot of changes in ha_innodb.cc (and MySQL itself). In innobase there are almost only the changes Osku made to the InnoDB internal SQL interpreter. Osku, if you are able to repeat, please try a hybrid 5.1.12 + innobase from 5.1.9. If also that crashes, then we know that innobase is innocent. This bug report is the most interesting of all the four stress testing reports because here we get a very clear assertion failure. Regards, Heikki
[31 Jul 2006 6:12]
Heikki Tuuri
Diff between a source tree of pre-5.1.8 and nightly 5.1.12
Attachment: ha_innodbdiff5.1.8-5.1.12.txt (text/plain), 18.36 KiB.
[31 Jul 2006 6:12]
Heikki Tuuri
Diff between a source tree of pre-5.1.8 and nightly 5.1.12
Attachment: innobasediff5.1.8-5.1.12.txt (text/plain), 56.19 KiB.
[31 Jul 2006 12:16]
Osku Salerma
Jonathan: Can you tell me what parameters you used with DBT2?
[31 Jul 2006 12:47]
Jonathan Miller
./run_mysql.sh -n dbt2 -h ndb09 -u root -s 0 -t 150 -z "initial testing for stability" -c 10 -w 4 -o /tmp/mysql.sock -v -e
[1 Aug 2006 15:15]
Heikki Tuuri
Jonathan, Osku was able to repeat this just once in his uniprocessor computer. What kind of test computer you have been using? Regards, Heikki
[1 Aug 2006 15:27]
Jonathan Miller
Hi Heikki Linux 2.6.16-1.2111_FC4smp SMP i686 athlon i386 GNU/Linux
[1 Aug 2006 15:33]
Heikki Tuuri
But is it a single-processor computer?
[1 Aug 2006 16:21]
Jonathan Miller
SMP Multi meaning that it has 2 CPU's
[3 Aug 2006 11:16]
Osku Salerma
Jonathan, Mikael: I've only managed to reproduce this bug twice during several days of testing. I've tried different machines (UP and SMP), different DBT2 parameters, etc, but nothing seems to make a difference. How reproducible is this problem for you and do you have any insights on how to make it happen more often?
[3 Aug 2006 12:37]
Jonathan Miller
I repeated it twice in a row before I opened the bug. I do not usually test InnoDB so I am not sure how reproducible is this problem. Others have hit it and you seem to have hit it twice as well. /jeb
[3 Aug 2006 18:09]
Heikki Tuuri
Hi! Osku was able to repeat the crash, and this time we had added code that hex dumped the prebuilt struct. We see that the field clust_index_was_generated contains value 3, which is nonsensical, since it should contain either 0 or 1. Thus, this looks like memory corruption that hits the sixth 64-bit field in a struct. At the start of June, Osku got a crash in a test of 5.1. In that case, number 3 had appeared in the place of a pointer. That pointer happens to be sixth 64-bit field in a lock struct. This looks like a typical error where a freed memory area (probably a struct) is accessed and number 3 is written to a field in it. This is probably an InnoDB bug. Next we should check the diff's that I attached to this bug report of we can find some memory corruption bug there. We could also look for an InnoDB struct where number 3 is legally written to the sixth field. Or we could change the #define's of InnoDB constants so that we could recognize which constant is written to that place! Regards, Heikki prebuilt: len 392; hex ef6fae0400000000 b83c6d952a000000 b8f070952a000000 0000000000000000 0100000000000000 0300000000000000 struct row_prebuilt_struct { ulint magic_n; /* this magic number is set to ROW_PREBUILT_ALLOCATED when created and to ROW_PREBUILT_FREED when the struct has been freed; used in debugging */ dict_table_t* table; /* Innobase table handle */ trx_t* trx; /* current transaction handle */ ibool sql_stat_start; /* TRUE when we start processing of an SQL statement: we may have to set an intention lock on the table, create a consistent read view etc. */ ibool mysql_has_locked; /* this is set TRUE when MySQL calls external_lock on this handle with a lock flag, and set FALSE when with the F_UNLOCK flag */ ibool clust_index_was_generated; /* if the user did not define a primary key in MySQL, then Innobase automatically generated a clustered index where the ordering column is the row id: in this case this flag is set to TRUE */ Osku's seg fault in June: 060602 14:10:16InnoDB: Assertion failure in thread 1578302384 in file lock0lock.c line 4583 InnoDB: Failing assertion: lock->index == index (gdb) bt #0 0x400d183b in raise () from /lib/tls/libc.so.6 #1 0x400d2fa2 in abort () from /lib/tls/libc.so.6 #2 0x08399500 in lock_rec_queue_validate (rec=0x410d610f "", index=0x4026bf68, offsets=0x5e12e5a0) at lock0lock.c:4583 #3 0x0839a768 in lock_clust_rec_read_check_and_lock (flags=0, rec=0x410d610f "", index=0x4026bf68, offsets=0x5e12e5a0, mode=5, gap_mode=512, thr=0x402751f8) at lock0lock.c:5169 #4 0x083cbd8f in sel_set_rec_lock (rec=0x410d610f "", index=0x4026bf68, offsets=0x5e12e5a0, mode=5, type=512, thr=0x402751f8) at row0sel.c:852 #5 0x083d0265 in row_search_for_mysql (buf=0x8d78470 "ÿ\237\001", mode=2, prebuilt=0x40274268, match_mode=1, direction=0) at row0sel.c:3808 #6 0x082b5e1d in ha_innobase::index_read (this=0x8d78358, buf=0x8d78470 "ÿ\237\001", key_ptr=0x8d95aa0 "\237\001", key_len=4, find_flag=HA_READ_KEY_EXACT) at ha_innodb.cc:4064 #7 0x082a7a50 in handler::read_range_first (this=0x8d78358, start_key=0x8d3a508, end_key=0x8d3a514, eq_range_arg=true, sorted=false) at handler.cc:2923 #8 0x082a769e in handler::read_multi_range_first (this=0x8d78358, found_range_p=0x5e12ed78, ranges=0x8d3a508, range_count=1, sorted=false, buffer=0x0) at handler.cc:2795 #9 0x08293725 in QUICK_RANGE_SELECT::get_next (this=0x8d58d20) at opt_range.cc:7732 #10 0x0829d57d in rr_quick (info=0x5e12f010) at records.cc:294 #11 0x0824b831 in mysql_delete (thd=0x8d869b0, table_list=0x8d99350, conds=0x8d995d8, order=0x8d86d24, limit=18446744073709551615, options=0, reset_auto_increment=false) at sql_delete.cc:229 #12 0x081d2d1a in mysql_execute_command (thd=0x8d869b0) at sql_parse.cc:3431 #13 0x081da09f in mysql_parse (thd=0x8d869b0, inBuf=0x8d992d0 "DELETE FROM isql0 WHERE v = 415", length=31) at sql_parse.cc:5902 #14 0x081ce710 in dispatch_command (command=COM_QUERY, thd=0x8d869b0, packet=0x8d91271 "DELETE FROM isql0 WHERE v = 415", packet_length=32) at sql_parse.cc:1762 struct lock_struct{ trx_t* trx; /* transaction owning the lock */ UT_LIST_NODE_T(lock_t) trx_locks; /* list of the locks of the transaction */ ulint type_mode; /* lock type, mode, LOCK_GAP or LOCK_REC_NOT_GAP, LOCK_INSERT_INTENTION, wait flag, ORed */ hash_node_t hash; /* hash chain node for a record lock */ dict_index_t* index; /* index for a record lock */ union { lock_table_t tab_lock;/* table lock */ lock_rec_t rec_lock;/* record lock */ } un_member; };
[3 Aug 2006 18:16]
Heikki Tuuri
Oops, I talked about 64-bit fields. In a 32-bit computer, I should have said a 32-bit field. A word-sized field would be the right word :).
[3 Aug 2006 20:20]
Heikki Tuuri
The diff files that I posted are wrong. They actually compare InnoDB-5.1.11 to 5.1.12! The diff between 5.1.9 and 5.1.12 is much bigger.
[3 Aug 2006 21:24]
Heikki Tuuri
The bug found. We should have a call of update_thd() below BEFORE we can trust that prebuilt->trx is the trx object belonging to thd! Now we only call it later, in ::external_lock(). Thus, trx can be wrong in the function below, and as we write the isolation level to the sixth field, we write it over something else => memory corruption with usually the magic number 3 ~ REPEATABLE READ occurs. The bug ultimately derives from the bad design in ha_innodb.c. We should ALWAYS retrieve the trx object from thd, and remove the field prebuilt->trx altogether. Current code is not 'normalized', as the pointer to trx is stored also to prebuilt, where it does not logically belong. It belongs to thd. In ha_innodb.cc functions where MySQL does not pass thd as an argument, we should use the thread-specific variable current_thd. This is the same bug as the READ COMMITTED bug that Guilhem found 2 days ago, as I suspected. THR_LOCK_DATA** ha_innobase::store_lock( /*====================*/ /* out: pointer to the next element in the 'to' array */ THD* thd, /* in: user thread handle */ THR_LOCK_DATA** to, /* in: pointer to an array of pointers to lock structs; pointer to the 'lock' field of current handle is stored next to this array */ enum thr_lock_type lock_type) /* in: lock type to store in 'lock'; this may also be TL_IGNORE */ { row_prebuilt_t* prebuilt = (row_prebuilt_t*) innobase_prebuilt; trx_t* trx = prebuilt->trx; /* NOTE: MySQL can call this function with lock 'type' TL_IGNORE! Be careful to ignore TL_IGNORE if we are going to do something with only 'real' locks! */ /* If no MySQL tables is use we need to set isolation level of the transaction. */ if (lock_type != TL_IGNORE && trx->n_mysql_tables_in_use == 0) { trx->isolation_level = innobase_map_isolation_level( (enum_tx_isolation) thd->variables.tx_isolation); }
[3 Aug 2006 21:42]
Heikki Tuuri
Suggested patch: replace the start of ::store_lock() in ha_innodb.cc with this: /********************************************************************* Converts a MySQL table lock stored in the 'lock' field of the handle to a proper type before storing pointer to the lock into an array of pointers. MySQL also calls this if it wants to reset some table locks to a not-locked state during the processing of an SQL query. An example is that during a SELECT the read lock is released early on the 'const' tables where we only fetch one row. MySQL does not call this when it releases all locks at the end of an SQL statement. */ THR_LOCK_DATA** ha_innobase::store_lock( /*====================*/ /* out: pointer to the next element in the 'to' array */ THD* thd, /* in: user thread handle */ THR_LOCK_DATA** to, /* in: pointer to an array of pointers to lock structs; pointer to the 'lock' field of current handle is stored next to this array */ enum thr_lock_type lock_type) /* in: lock type to store in 'lock'; this may also be TL_IGNORE */ { row_prebuilt_t* prebuilt = (row_prebuilt_t*) innobase_prebuilt; trx_t* trx; update_thd(thd); /* This must be called BEFORE we know prebuilt->trx is up to date! */ trx = prebuilt->trx; /* NOTE: MySQL can call this function with lock 'type' TL_IGNORE! Be careful to ignore TL_IGNORE if we are going to do something with only 'real' locks! */
[3 Aug 2006 22:00]
Heikki Tuuri
It was this patch from April 23 that broke InnoDB-5.1, written by Jan and approved by myself: http://mysql.bkbits.net:8080/mysql-5.1/patch@1.2119.464.2?nav=index.html|ChangeSet@-6M|cse... The purpose was to make READ COMMITTED to imply innodb_locks_unsafe_for_binlog. Already in 5.0 there was a slight bug that we referred to prebuilt->trx here: if (srv_locks_unsafe_for_binlog && prebuilt->trx->isolation_level != TRX_ISO_SERIALIZABLE && But that bug went unnoticed. It should now be fixed in 5.0.
[4 Aug 2006 6:50]
Heikki Tuuri
Hmm... removing the trx field from prebuilt is more problematic than I thought. A pointer to trx is also stored in the 'query graphs' that are in the prebuilt struct. And I do not like the idea of using a thread local variable current_trx! Another solution is to change the name of ::update_thd() to something more descriptive, like ::prepare_table_handle_for_trx() and adding more assertions that check that: ut_a(prebuilt->trx == (trx_t*) current_thd->ha_data[innobase_hton.slot]); I will now write a patch for this in 5.1.
[9 Aug 2006 14:10]
Heikki Tuuri
Marking this as a showstopper. We must get the fix to 5.1.12.
[22 Aug 2006 10:18]
Tatiana Azundris Nuernberg
fixed by inno patchset merged into 5.1.12-beta (5.1-new-maint)
[5 Sep 2006 17:31]
Heikki Tuuri
Mads, this is only in 5.1.11. Heikki
[13 Sep 2006 8:10]
Timothy Smith
Pushed to 5.1.12
[14 Sep 2006 11:52]
Heikki Tuuri
The precise description of the bug: Running InnoDB with many concurrent threads could cause memory corruption and a seg fault in an arbitrary place in mysqld. This bug was introduced in 5.1.11.
[14 Sep 2006 13:45]
Paul DuBois
Heikki, thanks for the note. Note in 5.1.12 changelog.
[5 May 2010 15:16]
Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[6 May 2010 2:14]
Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.
[28 May 2010 5:48]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)
[28 May 2010 6:17]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 6:45]
Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[29 May 2010 2:48]
Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.
[17 Jun 2010 11:48]
Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:25]
Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:13]
Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)