Bug #31494 innodb + 5.1 + read committed crash, assertion
Submitted: 9 Oct 2007 22:34 Modified: 17 Jun 2010 22:58
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.1.17, 5.1.21,5.1.22,5.1.23 OS:Any (32/64-bit)
Assigned to: Inaam Rana CPU Architecture:Any
Tags: assertion, crash, read committed, read uncommitted

[9 Oct 2007 22:34] Shane Bester
Description:
Could be related to Bug #31310 , but since this is a crash and likely to be seen in the field, I'm opening a new report.

E:\mysql-5.1.22-rc-winx64\bin>mysqld --innodb_buffer_pool_size=100M  --transaction-isolation=READ-COMMITTED --innodb_flush_log_at_trx_commit=0 --console --skip-grant-tables --innodb-locks-unsafe-for-binlog=1 --innodb_support_xa=0
071009 23:28:19  InnoDB: Started; log sequence number 0 70357838
071009 23:28:19 [Note] mysqld: ready for connections.
Version: '5.1.22-rc-community'  socket: ''  port: 3306  MySQL Community Server (GPL)
071009 23:47:22071009 23:47:22  InnoDB: Assertion failure in thread 13032 in file .\row\row0mysql.c line 457
  InnoDB: Assertion failure in thread 12352 in file .\row\row0mysql.c line 457
InnoDB: Failing assertion: err != DB_SUCCESS
InnoDB: Failing assertion: err != DB_SUCCESS

Shortened stack trace from 5.1.22 on Windows:

mysqld.exe!row_mysql_handle_errors
mysqld.exe!row_search_for_mysql
mysqld.exe!ha_innobase::general_fetch
mysqld.exe!rr_sequential
mysqld.exe!mysql_update
mysqld.exe!mysql_execute_command
mysqld.exe!Prepared_statement::execute
mysqld.exe!mysql_stmt_execute
mysqld.exe!dispatch_command
mysqld.exe!do_command
mysqld.exe!handle_one_connection
mysqld.exe!pthread_start
mysqld.exe!_callthreadstart
mysqld.exe!_threadstart
kernel32.dll!BaseThreadStart

How to repeat:
uploading testcase later.
[9 Oct 2007 22:57] MySQL Verification Team
The testcase does this in 20 threads:
-------------
start transaction
insert ignore into `t` set `a`=abs(?),`b`=abs(?),`c`=abs(?)
commit
start transaction
update `t` set `c`=`a`+`b` where `b`=abs(?)
commit
-------------

typical runtime before a crash is 30 seconds.
[9 Oct 2007 23:07] MySQL Verification Team
proper testcase!

Attachment: bug31494.c (text/plain), 5.72 KiB.

[10 Oct 2007 12:12] Heikki Tuuri
I am checking now if this is a bug in InnoDB's semi-consistent read code...

/********************************************************************
Handles user errors and lock waits detected by the database engine. */

ibool
row_mysql_handle_errors(
/*====================*/
                                /* out: TRUE if it was a lock wait and
                                we should continue running the query thread */
        ulint*          new_err,/* out: possible new error encountered in
                                lock wait, or if no new error, the value
                                of trx->error_state at the entry of this
                                function */
        trx_t*          trx,    /* in: transaction */
        que_thr_t*      thr,    /* in: query thread */
        trx_savept_t*   savept) /* in: savepoint or NULL */
{
#ifndef UNIV_HOTBACKUP
        ulint   err;

handle_new_error:
        err = trx->error_state;

        ut_a(err != DB_SUCCESS);
[10 Oct 2007 13:19] Heikki Tuuri
The only way it can fail is that trx->error_state is DB_SUCCESS when row_search_for_mysql() calls row_mysql_handle_errors().
[10 Oct 2007 13:24] Heikki Tuuri
But in row0sel.c there is the code:

"
lock_wait_or_error:
        /* Reset the old and new "did semi-consistent read" flags. */
        if (UNIV_UNLIKELY(prebuilt->row_read_type
                          == ROW_READ_DID_SEMI_CONSISTENT)) {
                prebuilt->row_read_type = ROW_READ_TRY_SEMI_CONSISTENT;
        }
        did_semi_consistent_read = FALSE;

        /*-------------------------------------------------------------*/

        btr_pcur_store_position(pcur, &mtr);

        mtr_commit(&mtr);
        mtr_has_extra_clust_latch = FALSE;

        trx->error_state = err;

        /* The following is a patch for MySQL */

        que_thr_stop_for_mysql(thr);

        thr->lock_state = QUE_THR_LOCK_ROW;

        if (row_mysql_handle_errors(&err, trx, thr, NULL)) {

"

and que_thr_stop_for_mysql() does not touch trx->error_state!

This suggest that the transaction object becomes corrupt and that causes the assertion to fail. Looks bad.
[10 Oct 2007 13:35] MySQL Verification Team
i forgot to upload this debugger outputs yesterday

Attachment: bug31494_some_debugger_outputs.txt (text/plain), 26.77 KiB.

[10 Oct 2007 15:32] Heikki Tuuri
In the debugger output, trx->error_state == 10 == DB_SUCCESS. A possible explanation is that InnoDB/MySQL uses the same transaction object trx for TWO queries at the same time.
[10 Oct 2007 15:54] MySQL Verification Team
I guess it would be easier to debug if you put a few seconds sleep() after the que_thr_stop_for_mysql(thr); call ?
[10 Oct 2007 23:07] MySQL Verification Team
this is tripping it:

f (trx->was_chosen_as_deadlock_victim) {
				mutex_exit(&kernel_mutex);
ut_a(err != DB_SUCCESS);
				goto lock_wait_or_error;
			}
[10 Oct 2007 23:14] MySQL Verification Team
in the debugger outputs I uploaded earlier, it was also the case:
was_chosen_as_deadlock_victim	0x0000000000000001	unsigned __int64

But, err is DB_SUCCESS still.  Hope this analysis is right!
In 5.0.50 there would have been trx->error_state = DB_DEADLOCK; in that block.
[10 Oct 2007 23:42] MySQL Verification Team
Looks like this bug's been around a while:
http://lists.mysql.com/commits/1605
[11 Oct 2007 8:04] Marko Mäkelä
Shane, I agree with your [11 Oct 1:07] comment. It seems that you found the only branch where lock_wait_or_error is reached with err==DB_SUCCESS. I checked all  "goto lock_wait_or_error" in row_search_for_mysql().
[11 Oct 2007 8:21] MySQL Verification Team
I wonder we never saw this bug before? At least it's new to me...
[11 Oct 2007 9:24] Marko Mäkelä
Semi-consistent reads are used in table-scanning UPDATEs when innodb_locks_unsafe_for_binlog is set. Maybe it is not a common combination.

Assigning err = DB_DEADLOCK seems to do the trick, but I will test this some more:

			mutex_enter(&kernel_mutex);
			if (trx->was_chosen_as_deadlock_victim) {
				mutex_exit(&kernel_mutex);
				err = DB_DEADLOCK;

				goto lock_wait_or_error;
			}

You might also want to test what happens in a multi-table UPDATE that scans multiple tables (if MySQL allows that).  See also Bug #3300.
[7 Nov 2007 1:00] Timothy Smith
Queued to 5.1-build
[9 Nov 2007 17:17] Heikki Tuuri
Test multi-table update.
[9 Nov 2007 17:17] Heikki Tuuri
Test multi-table update.
[9 Nov 2007 17:17] Heikki Tuuri
Test multi-table update.
[12 Nov 2007 8:22] Marko Mäkelä
I am assigning this bug away from me, because the fix needs to be made in the MySQL layer, where the UPDATE statements are executed. This bug cannot be fixed in the storage engine (InnoDB), other than by disabling semi-consistent reads altogether.
[21 Nov 2007 18:55] Bugs System
Pushed into 5.1.23-rc
[21 Nov 2007 18:55] Bugs System
Pushed into 6.0.4-alpha
[26 Nov 2007 17:17] Heikki Tuuri
Inaam please test multi-table update.
[15 Jan 2008 2:57] Inaam Rana
I am moving this to closed state. I have tested it with multi-table update.
The testing mechanism that I used is to tweak the test program of Shane so that it creates multiple tables and each thread will do an update involving half of these tables. There was no crash. However, it looks that the bug is not sensitized in case of multiple table update anyways.
[5 May 2010 15:16] Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[6 May 2010 3:05] Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.
[28 May 2010 5:56] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)
[28 May 2010 6:25] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 6:53] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 19:20] Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.
[17 Jun 2010 11:57] Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:37] Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:23] Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)