Bug #31494 | innodb + 5.1 + read committed crash, assertion | ||
---|---|---|---|
Submitted: | 9 Oct 2007 22:34 | Modified: | 17 Jun 2010 22:58 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S1 (Critical) |
Version: | 5.1.17, 5.1.21,5.1.22,5.1.23 | OS: | Any (32/64-bit) |
Assigned to: | Inaam Rana | CPU Architecture: | Any |
Tags: | assertion, crash, read committed, read uncommitted |
[9 Oct 2007 22:34]
Shane Bester
[9 Oct 2007 22:57]
MySQL Verification Team
The testcase does this in 20 threads: ------------- start transaction insert ignore into `t` set `a`=abs(?),`b`=abs(?),`c`=abs(?) commit start transaction update `t` set `c`=`a`+`b` where `b`=abs(?) commit ------------- typical runtime before a crash is 30 seconds.
[9 Oct 2007 23:07]
MySQL Verification Team
proper testcase!
Attachment: bug31494.c (text/plain), 5.72 KiB.
[10 Oct 2007 12:12]
Heikki Tuuri
I am checking now if this is a bug in InnoDB's semi-consistent read code... /******************************************************************** Handles user errors and lock waits detected by the database engine. */ ibool row_mysql_handle_errors( /*====================*/ /* out: TRUE if it was a lock wait and we should continue running the query thread */ ulint* new_err,/* out: possible new error encountered in lock wait, or if no new error, the value of trx->error_state at the entry of this function */ trx_t* trx, /* in: transaction */ que_thr_t* thr, /* in: query thread */ trx_savept_t* savept) /* in: savepoint or NULL */ { #ifndef UNIV_HOTBACKUP ulint err; handle_new_error: err = trx->error_state; ut_a(err != DB_SUCCESS);
[10 Oct 2007 13:19]
Heikki Tuuri
The only way it can fail is that trx->error_state is DB_SUCCESS when row_search_for_mysql() calls row_mysql_handle_errors().
[10 Oct 2007 13:24]
Heikki Tuuri
But in row0sel.c there is the code: " lock_wait_or_error: /* Reset the old and new "did semi-consistent read" flags. */ if (UNIV_UNLIKELY(prebuilt->row_read_type == ROW_READ_DID_SEMI_CONSISTENT)) { prebuilt->row_read_type = ROW_READ_TRY_SEMI_CONSISTENT; } did_semi_consistent_read = FALSE; /*-------------------------------------------------------------*/ btr_pcur_store_position(pcur, &mtr); mtr_commit(&mtr); mtr_has_extra_clust_latch = FALSE; trx->error_state = err; /* The following is a patch for MySQL */ que_thr_stop_for_mysql(thr); thr->lock_state = QUE_THR_LOCK_ROW; if (row_mysql_handle_errors(&err, trx, thr, NULL)) { " and que_thr_stop_for_mysql() does not touch trx->error_state! This suggest that the transaction object becomes corrupt and that causes the assertion to fail. Looks bad.
[10 Oct 2007 13:35]
MySQL Verification Team
i forgot to upload this debugger outputs yesterday
Attachment: bug31494_some_debugger_outputs.txt (text/plain), 26.77 KiB.
[10 Oct 2007 15:32]
Heikki Tuuri
In the debugger output, trx->error_state == 10 == DB_SUCCESS. A possible explanation is that InnoDB/MySQL uses the same transaction object trx for TWO queries at the same time.
[10 Oct 2007 15:54]
MySQL Verification Team
I guess it would be easier to debug if you put a few seconds sleep() after the que_thr_stop_for_mysql(thr); call ?
[10 Oct 2007 23:07]
MySQL Verification Team
this is tripping it: f (trx->was_chosen_as_deadlock_victim) { mutex_exit(&kernel_mutex); ut_a(err != DB_SUCCESS); goto lock_wait_or_error; }
[10 Oct 2007 23:14]
MySQL Verification Team
in the debugger outputs I uploaded earlier, it was also the case: was_chosen_as_deadlock_victim 0x0000000000000001 unsigned __int64 But, err is DB_SUCCESS still. Hope this analysis is right! In 5.0.50 there would have been trx->error_state = DB_DEADLOCK; in that block.
[10 Oct 2007 23:42]
MySQL Verification Team
Looks like this bug's been around a while: http://lists.mysql.com/commits/1605
[11 Oct 2007 8:04]
Marko Mäkelä
Shane, I agree with your [11 Oct 1:07] comment. It seems that you found the only branch where lock_wait_or_error is reached with err==DB_SUCCESS. I checked all "goto lock_wait_or_error" in row_search_for_mysql().
[11 Oct 2007 8:21]
MySQL Verification Team
I wonder we never saw this bug before? At least it's new to me...
[11 Oct 2007 9:24]
Marko Mäkelä
Semi-consistent reads are used in table-scanning UPDATEs when innodb_locks_unsafe_for_binlog is set. Maybe it is not a common combination. Assigning err = DB_DEADLOCK seems to do the trick, but I will test this some more: mutex_enter(&kernel_mutex); if (trx->was_chosen_as_deadlock_victim) { mutex_exit(&kernel_mutex); err = DB_DEADLOCK; goto lock_wait_or_error; } You might also want to test what happens in a multi-table UPDATE that scans multiple tables (if MySQL allows that). See also Bug #3300.
[7 Nov 2007 1:00]
Timothy Smith
Queued to 5.1-build
[9 Nov 2007 17:17]
Heikki Tuuri
Test multi-table update.
[9 Nov 2007 17:17]
Heikki Tuuri
Test multi-table update.
[9 Nov 2007 17:17]
Heikki Tuuri
Test multi-table update.
[12 Nov 2007 8:22]
Marko Mäkelä
I am assigning this bug away from me, because the fix needs to be made in the MySQL layer, where the UPDATE statements are executed. This bug cannot be fixed in the storage engine (InnoDB), other than by disabling semi-consistent reads altogether.
[21 Nov 2007 18:55]
Bugs System
Pushed into 5.1.23-rc
[21 Nov 2007 18:55]
Bugs System
Pushed into 6.0.4-alpha
[26 Nov 2007 17:17]
Heikki Tuuri
Inaam please test multi-table update.
[15 Jan 2008 2:57]
Inaam Rana
I am moving this to closed state. I have tested it with multi-table update. The testing mechanism that I used is to tweak the test program of Shane so that it creates multiple tables and each thread will do an update involving half of these tables. There was no crash. However, it looks that the bug is not sensitized in case of multiple table update anyways.
[5 May 2010 15:16]
Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[6 May 2010 3:05]
Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.
[28 May 2010 5:56]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)
[28 May 2010 6:25]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 6:53]
Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 19:20]
Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.
[17 Jun 2010 11:57]
Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:37]
Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:23]
Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)