| Bug #31494 | innodb + 5.1 + read committed crash, assertion | ||
|---|---|---|---|
| Submitted: | 10 Oct 2007 0:34 | Modified: | 15 Jan 2008 3:57 |
| Reporter: | Shane Bester | ||
| Status: | Closed | ||
| Category: | Server | Severity: | S1 (Critical) |
| Version: | 5.1.17, 5.1.21,5.1.22,5.1.23 | OS: | Any (32/64-bit) |
| Assigned to: | Inaam Rana | Target Version: | |
| Tags: | assertion, read committed, crash, read uncommitted | ||
| Triage: | D1 (Critical) | ||
[10 Oct 2007 0:34]
Shane Bester
[10 Oct 2007 0:57]
Shane Bester
The testcase does this in 20 threads: ------------- start transaction insert ignore into `t` set `a`=abs(?),`b`=abs(?),`c`=abs(?) commit start transaction update `t` set `c`=`a`+`b` where `b`=abs(?) commit ------------- typical runtime before a crash is 30 seconds.
[10 Oct 2007 1:07]
Shane Bester
proper testcase!
Attachment: bug31494.c (text/plain), 5.72 KiB.
[10 Oct 2007 14:12]
Heikki Tuuri
I am checking now if this is a bug in InnoDB's semi-consistent read code...
/********************************************************************
Handles user errors and lock waits detected by the database engine. */
ibool
row_mysql_handle_errors(
/*====================*/
/* out: TRUE if it was a lock wait and
we should continue running the query thread */
ulint* new_err,/* out: possible new error encountered in
lock wait, or if no new error, the value
of trx->error_state at the entry of this
function */
trx_t* trx, /* in: transaction */
que_thr_t* thr, /* in: query thread */
trx_savept_t* savept) /* in: savepoint or NULL */
{
#ifndef UNIV_HOTBACKUP
ulint err;
handle_new_error:
err = trx->error_state;
ut_a(err != DB_SUCCESS);
[10 Oct 2007 15:19]
Heikki Tuuri
The only way it can fail is that trx->error_state is DB_SUCCESS when row_search_for_mysql() calls row_mysql_handle_errors().
[10 Oct 2007 15:24]
Heikki Tuuri
But in row0sel.c there is the code:
"
lock_wait_or_error:
/* Reset the old and new "did semi-consistent read" flags. */
if (UNIV_UNLIKELY(prebuilt->row_read_type
== ROW_READ_DID_SEMI_CONSISTENT)) {
prebuilt->row_read_type = ROW_READ_TRY_SEMI_CONSISTENT;
}
did_semi_consistent_read = FALSE;
/*-------------------------------------------------------------*/
btr_pcur_store_position(pcur, &mtr);
mtr_commit(&mtr);
mtr_has_extra_clust_latch = FALSE;
trx->error_state = err;
/* The following is a patch for MySQL */
que_thr_stop_for_mysql(thr);
thr->lock_state = QUE_THR_LOCK_ROW;
if (row_mysql_handle_errors(&err, trx, thr, NULL)) {
"
and que_thr_stop_for_mysql() does not touch trx->error_state!
This suggest that the transaction object becomes corrupt and that causes the assertion to
fail. Looks bad.
[10 Oct 2007 15:35]
Shane Bester
i forgot to upload this debugger outputs yesterday
Attachment: bug31494_some_debugger_outputs.txt (text/plain), 26.77 KiB.
[10 Oct 2007 17:32]
Heikki Tuuri
In the debugger output, trx->error_state == 10 == DB_SUCCESS. A possible explanation is that InnoDB/MySQL uses the same transaction object trx for TWO queries at the same time.
[10 Oct 2007 17:54]
Shane Bester
I guess it would be easier to debug if you put a few seconds sleep() after the que_thr_stop_for_mysql(thr); call ?
[11 Oct 2007 1:07]
Shane Bester
this is tripping it:
f (trx->was_chosen_as_deadlock_victim) {
mutex_exit(&kernel_mutex);
ut_a(err != DB_SUCCESS);
goto lock_wait_or_error;
}
[11 Oct 2007 1:14]
Shane Bester
in the debugger outputs I uploaded earlier, it was also the case: was_chosen_as_deadlock_victim 0x0000000000000001 unsigned __int64 But, err is DB_SUCCESS still. Hope this analysis is right! In 5.0.50 there would have been trx->error_state = DB_DEADLOCK; in that block.
[11 Oct 2007 1:42]
Shane Bester
Looks like this bug's been around a while: http://lists.mysql.com/commits/1605
[11 Oct 2007 10:04]
Marko Mäkelä
Shane, I agree with your [11 Oct 1:07] comment. It seems that you found the only branch where lock_wait_or_error is reached with err==DB_SUCCESS. I checked all "goto lock_wait_or_error" in row_search_for_mysql().
[11 Oct 2007 10:21]
Shane Bester
I wonder we never saw this bug before? At least it's new to me...
[11 Oct 2007 11:24]
Marko Mäkelä
Semi-consistent reads are used in table-scanning UPDATEs when
innodb_locks_unsafe_for_binlog is set. Maybe it is not a common combination.
Assigning err = DB_DEADLOCK seems to do the trick, but I will test this some more:
mutex_enter(&kernel_mutex);
if (trx->was_chosen_as_deadlock_victim) {
mutex_exit(&kernel_mutex);
err = DB_DEADLOCK;
goto lock_wait_or_error;
}
You might also want to test what happens in a multi-table UPDATE that scans multiple
tables (if MySQL allows that). See also Bug #3300.
[7 Nov 2007 2:00]
Timothy Smith
Queued to 5.1-build
[9 Nov 2007 18:17]
Heikki Tuuri
Test multi-table update.
[9 Nov 2007 18:17]
Heikki Tuuri
Test multi-table update.
[9 Nov 2007 18:17]
Heikki Tuuri
Test multi-table update.
[12 Nov 2007 9:22]
Marko Mäkelä
I am assigning this bug away from me, because the fix needs to be made in the MySQL layer, where the UPDATE statements are executed. This bug cannot be fixed in the storage engine (InnoDB), other than by disabling semi-consistent reads altogether.
[21 Nov 2007 19:55]
Bugs System
Pushed into 5.1.23-rc
[21 Nov 2007 19:55]
Bugs System
Pushed into 6.0.4-alpha
[26 Nov 2007 18:17]
Heikki Tuuri
Inaam please test multi-table update.
[15 Jan 2008 3:57]
Inaam Rana
I am moving this to closed state. I have tested it with multi-table update. The testing mechanism that I used is to tweak the test program of Shane so that it creates multiple tables and each thread will do an update involving half of these tables. There was no crash. However, it looks that the bug is not sensitized in case of multiple table update anyways.
