Bug #31494 innodb + 5.1 + read committed crash, assertion
Submitted: 10 Oct 2007 0:34 Modified: 15 Jan 2008 3:57
Reporter: Shane Bester
Status: Closed
Category:Server Severity:S1 (Critical)
Version:5.1.17, 5.1.21,5.1.22,5.1.23 OS:Any (32/64-bit)
Assigned to: Inaam Rana Target Version:
Tags: assertion, read committed, crash, read uncommitted
Triage: D1 (Critical)

[10 Oct 2007 0:34] Shane Bester
Description:
Could be related to Bug #31310 , but since this is a crash and likely to be seen in the
field, I'm opening a new report.

E:\mysql-5.1.22-rc-winx64\bin>mysqld --innodb_buffer_pool_size=100M 
--transaction-isolation=READ-COMMITTED --innodb_flush_log_at_trx_commit=0 --console
--skip-grant-tables --innodb-locks-unsafe-for-binlog=1 --innodb_support_xa=0
071009 23:28:19  InnoDB: Started; log sequence number 0 70357838
071009 23:28:19 [Note] mysqld: ready for connections.
Version: '5.1.22-rc-community'  socket: ''  port: 3306  MySQL Community Server (GPL)
071009 23:47:22071009 23:47:22  InnoDB: Assertion failure in thread 13032 in file
.\row\row0mysql.c line 457
  InnoDB: Assertion failure in thread 12352 in file .\row\row0mysql.c line 457
InnoDB: Failing assertion: err != DB_SUCCESS
InnoDB: Failing assertion: err != DB_SUCCESS

Shortened stack trace from 5.1.22 on Windows:

mysqld.exe!row_mysql_handle_errors
mysqld.exe!row_search_for_mysql
mysqld.exe!ha_innobase::general_fetch
mysqld.exe!rr_sequential
mysqld.exe!mysql_update
mysqld.exe!mysql_execute_command
mysqld.exe!Prepared_statement::execute
mysqld.exe!mysql_stmt_execute
mysqld.exe!dispatch_command
mysqld.exe!do_command
mysqld.exe!handle_one_connection
mysqld.exe!pthread_start
mysqld.exe!_callthreadstart
mysqld.exe!_threadstart
kernel32.dll!BaseThreadStart

How to repeat:
uploading testcase later.
[10 Oct 2007 0:57] Shane Bester
The testcase does this in 20 threads:
-------------
start transaction
insert ignore into `t` set `a`=abs(?),`b`=abs(?),`c`=abs(?)
commit
start transaction
update `t` set `c`=`a`+`b` where `b`=abs(?)
commit
-------------

typical runtime before a crash is 30 seconds.
[10 Oct 2007 1:07] Shane Bester
proper testcase!

Attachment: bug31494.c (text/plain), 5.72 KiB.

[10 Oct 2007 14:12] Heikki Tuuri
I am checking now if this is a bug in InnoDB's semi-consistent read code...

/********************************************************************
Handles user errors and lock waits detected by the database engine. */

ibool
row_mysql_handle_errors(
/*====================*/
                                /* out: TRUE if it was a lock wait and
                                we should continue running the query thread */
        ulint*          new_err,/* out: possible new error encountered in
                                lock wait, or if no new error, the value
                                of trx->error_state at the entry of this
                                function */
        trx_t*          trx,    /* in: transaction */
        que_thr_t*      thr,    /* in: query thread */
        trx_savept_t*   savept) /* in: savepoint or NULL */
{
#ifndef UNIV_HOTBACKUP
        ulint   err;

handle_new_error:
        err = trx->error_state;

        ut_a(err != DB_SUCCESS);
[10 Oct 2007 15:19] Heikki Tuuri
The only way it can fail is that trx->error_state is DB_SUCCESS when
row_search_for_mysql() calls row_mysql_handle_errors().
[10 Oct 2007 15:24] Heikki Tuuri
But in row0sel.c there is the code:

"
lock_wait_or_error:
        /* Reset the old and new "did semi-consistent read" flags. */
        if (UNIV_UNLIKELY(prebuilt->row_read_type
                          == ROW_READ_DID_SEMI_CONSISTENT)) {
                prebuilt->row_read_type = ROW_READ_TRY_SEMI_CONSISTENT;
        }
        did_semi_consistent_read = FALSE;

        /*-------------------------------------------------------------*/

        btr_pcur_store_position(pcur, &mtr);

        mtr_commit(&mtr);
        mtr_has_extra_clust_latch = FALSE;

        trx->error_state = err;

        /* The following is a patch for MySQL */

        que_thr_stop_for_mysql(thr);

        thr->lock_state = QUE_THR_LOCK_ROW;

        if (row_mysql_handle_errors(&err, trx, thr, NULL)) {

"

and que_thr_stop_for_mysql() does not touch trx->error_state!

This suggest that the transaction object becomes corrupt and that causes the assertion to
fail. Looks bad.
[10 Oct 2007 15:35] Shane Bester
i forgot to upload this debugger outputs yesterday

Attachment: bug31494_some_debugger_outputs.txt (text/plain), 26.77 KiB.

[10 Oct 2007 17:32] Heikki Tuuri
In the debugger output, trx->error_state == 10 == DB_SUCCESS. A possible explanation is
that InnoDB/MySQL uses the same transaction object trx for TWO queries at the same time.
[10 Oct 2007 17:54] Shane Bester
I guess it would be easier to debug if you put a few seconds sleep() after the
que_thr_stop_for_mysql(thr); call ?
[11 Oct 2007 1:07] Shane Bester
this is tripping it:

f (trx->was_chosen_as_deadlock_victim) {
				mutex_exit(&kernel_mutex);
ut_a(err != DB_SUCCESS);
				goto lock_wait_or_error;
			}
[11 Oct 2007 1:14] Shane Bester
in the debugger outputs I uploaded earlier, it was also the case:
was_chosen_as_deadlock_victim	0x0000000000000001	unsigned __int64

But, err is DB_SUCCESS still.  Hope this analysis is right!
In 5.0.50 there would have been trx->error_state = DB_DEADLOCK; in that block.
[11 Oct 2007 1:42] Shane Bester
Looks like this bug's been around a while:
http://lists.mysql.com/commits/1605
[11 Oct 2007 10:04] Marko Mäkelä
Shane, I agree with your [11 Oct 1:07] comment. It seems that you found the only branch
where lock_wait_or_error is reached with err==DB_SUCCESS. I checked all  "goto
lock_wait_or_error" in row_search_for_mysql().
[11 Oct 2007 10:21] Shane Bester
I wonder we never saw this bug before? At least it's new to me...
[11 Oct 2007 11:24] Marko Mäkelä
Semi-consistent reads are used in table-scanning UPDATEs when
innodb_locks_unsafe_for_binlog is set. Maybe it is not a common combination.

Assigning err = DB_DEADLOCK seems to do the trick, but I will test this some more:

			mutex_enter(&kernel_mutex);
			if (trx->was_chosen_as_deadlock_victim) {
				mutex_exit(&kernel_mutex);
				err = DB_DEADLOCK;

				goto lock_wait_or_error;
			}

You might also want to test what happens in a multi-table UPDATE that scans multiple
tables (if MySQL allows that).  See also Bug #3300.
[7 Nov 2007 2:00] Timothy Smith
Queued to 5.1-build
[9 Nov 2007 18:17] Heikki Tuuri
Test multi-table update.
[9 Nov 2007 18:17] Heikki Tuuri
Test multi-table update.
[9 Nov 2007 18:17] Heikki Tuuri
Test multi-table update.
[12 Nov 2007 9:22] Marko Mäkelä
I am assigning this bug away from me, because the fix needs to be made in the MySQL layer,
where the UPDATE statements are executed. This bug cannot be fixed in the storage engine
(InnoDB), other than by disabling semi-consistent reads altogether.
[21 Nov 2007 19:55] Bugs System
Pushed into 5.1.23-rc
[21 Nov 2007 19:55] Bugs System
Pushed into 6.0.4-alpha
[26 Nov 2007 18:17] Heikki Tuuri
Inaam please test multi-table update.
[15 Jan 2008 3:57] Inaam Rana
I am moving this to closed state. I have tested it with multi-table update.
The testing mechanism that I used is to tweak the test program of Shane so that it
creates multiple tables and each thread will do an update involving half of these tables.
There was no crash. However, it looks that the bug is not sensitized in case of multiple
table update anyways.