MySQL Bugs: #54113: Assertion: trx->dict_operation_lock_mode == RW_S

Bug #54113	Assertion: trx->dict_operation_lock_mode == RW_S_LATCH in row0mysql.c line 1704
Submitted:	31 May 2010 17:05	Modified:	5 Aug 2010 12:49
Reporter:	Elena Stepanova	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	mysql-trunk-bugfixing	OS:	Any
Assigned to:	Marko Mäkelä	CPU Architecture:	Any

Description:
InnoDB Plugin 1.0.6

100527 18:42:24  InnoDB: Assertion failure in thread 710 in file mysql-5.5.5-m3/storage/innobase/row/row0mysql.c line 1704
InnoDB: Failing assertion: trx->dict_operation_lock_mode == RW_S_LATCH
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
InnoDB: about forcing recovery.
InnoDB: Thread 718 stopped in file mysql-5.5.5-m3/storage/innobase/page/page0cur.c line 588
100527 18:42:24 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=72
max_threads=120
thread_count=63
connection_count=63
It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 48421 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x8c68180
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
InnoDB: Thread 719 stopped in file mysql-5.5.5-m3/storage/innobase/fil/fil0fil.c line 4578
InnoDB: Thread 701 stopped in file mysql-5.5.5-m3/storage/innobase/handler/ha_innodb.cc line 2776
InnoDB: Thread 6 stopped in file mysql-5.5.5-m3/storage/innobase/os/os0sync.c line 563
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'my_print_stacktrace+0x34 [0x176bbf4]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'handle_segfault+0x3bf [0x8c9e6f]
InnoDB: Thread 8 stopped in file mysql-5.5.5-m3/storage/innobase/lock/lock0lock.c line 2253
/lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffd7fff26b076]
/lib/amd64/libc.so.1'call_user_handler+0x2a7 [0xfffffd7fff25dfaf]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'0xbc3a3c [0xfc3a3c] [Signal 11 (SEGV)]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'0x11780dc [0x15780dc]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'0x1179a46 [0x1579a46]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'0x1179daf [0x1579daf]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'0x117a0df [0x157a0df]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'0x117a37e [0x157a37e]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'0xbc24e0 [0xfc24e0]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cLha_innobaseJwrite_row6MpC_i_+0x6aa [0xf6964a]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cHhandlerMha_write_row6MpC_i_+0x97 [0xba9637]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cMwrite_record6FpnDTHD_pnFTABLE_pnMst_copy_info__i_+0xcd2 [0x98c212]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cMmysql_insert6FpnDTHD_pnKTABLE_LIST_rnEList4nEItem___rnEList4n0D___55nPenum_duplicates_b_b_+0xf7c [0x989c2c]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cVmysql_execute_command6FpnDTHD__i_+0x3d58 [0x9ab878]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cNsp_instr_stmtJexec_core6MpnDTHD_pI_i_+0xb2 [0x90ee32]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cNsp_lex_keeperXreset_lex_and_exec_core6MpnDTHD_pIbpnIsp_instr__i_+0x214 [0x90e534]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cNsp_instr_stmtHexecute6MpnDTHD_pI_i_+0x1e8 [0x90eaf8]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cHsp_headHexecute6MpnDTHD__b_+0x747 [0x9095a7]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cHsp_headRexecute_procedure6MpnDTHD_pnEList4nEItem____b_+0xae5 [0x90baf5]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cVmysql_execute_command6FpnDTHD__i_+0x8e18 [0x9b0938]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cLmysql_parse6FpnDTHD_pkcIpnMParser_state__v_+0x34e [0x9b5a6e]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cQdispatch_command6FnTenum_server_command_pnDTHD_pcI_b_+0xbd1 [0x9a5311]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cKdo_command6FpnDTHD__b_+0x320 [0x9a4460]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cYdo_handle_one_connection6FpnDTHD__v_+0x273 [0xad5563]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'handle_one_connection+0x37 [0xad52c7]
mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld'__1cQpfs_spawn_thread6Fpv_0_+0xe1 [0xd995b1]
/lib/amd64/libc.so.1'_thrp_setup+0x8d [0xfffffd7fff26acf5]
/lib/amd64/libc.so.1'_lwp_start+0x0 [0xfffffd7fff26afb0]
Please read http://dev.mysql.com/doc/refman/5.1/en/resolve-stack-dump.html
and follow instructions on how to resolve the stack trace.
Resolved stack trace is much more helpful in diagnosing the
problem, so please do resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 948a390 = INSERT INTO test.insdel2_sub VALUES (NULL, NAME_CONST('ins_count',99), 'More text to test with')
thd->thread_id=699
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file

How to repeat:
No repeatable test case yet

pstack, all threads

Attachment: bug54113_pstack.out (application/octet-stream, text), 140.40 KiB.

Some notes on the test flow.

In error log, server complains about this statement:
thd->query at 948a390 = 
INSERT INTO test.insdel2_sub VALUES (NULL,NAME_CONST('ins_count',99), 'More text to test with')

This statement is executed as a part of a stored procedure, below is its definition:

CREATE PROCEDURE test.insdel2()
BEGIN
   DECLARE ins_count INT DEFAULT 100;
   DECLARE del_count INT;
   DECLARE cur_user VARCHAR(255);
   DECLARE local_uuid VARCHAR(255);
   DECLARE local_time TIMESTAMP;

   SET local_time= NOW();
   SET cur_user = CURRENT_USER();
   SET local_uuid=UUID();

   WHILE ins_count > 0 DO
       INSERT INTO test.insdel2_tbl VALUES (NULL, NOW(), USER(), UUID(),
                                    ins_count,'Going to test MBR for MySQL');

     INSERT INTO test.insdel2_sub VALUES (NULL,ins_count, 'More text to test with');
     SET ins_count = ins_count - 1;
   END WHILE;

   SELECT MAX(id) FROM test.insdel2_tbl INTO del_count;
   WHILE del_count > 0 DO
     DELETE FROM test.insdel2_tbl WHERE id = del_count;
   SET del_count = del_count - 2;
   END WHILE;
END

Involved tables:

CREATE TABLE test.insdel2_tbl(id MEDIUMINT NOT NULL AUTO_INCREMENT,
dt TIMESTAMP, user CHAR(255), uuidf LONGBLOB,
fkid MEDIUMINT, filler VARCHAR(255),
PRIMARY KEY(id))ENGINE=innodb

CREATE TABLE test.insdel2_sub (tid MEDIUMINT AUTO_INCREMENT,fkid MEDIUMINT,
filler VARCHAR(255), FOREIGN KEY(tid) REFERENCES
test.insdel2_tbl(id) ON DELETE CASCADE,
PRIMARY KEY(tid))ENGINE=innodb

There are also triggers on the tables:

CREATE TRIGGER test.insdel2_bd BEFORE DELETE
ON test.insdel2_tbl FOR EACH ROW
BEGIN
INSERT INTO test.insdel2_tracker  VALUES (NULL, NOW(), USER());
END

CREATE TRIGGER test.insdel2_bi BEFORE INSERT
ON test.insdel2_sub FOR EACH ROW
BEGIN
INSERT INTO test.insdel2_tracker2 VALUES (NULL, NOW(), USER(), RAND());
END

The procedure is called by a test in transactional context, nothing happens in the transaction before the call.

START TRANSACTION;
CALL test.insdel2();

The test was run in concurrent mode (up to 100 threads running different test flow, working with different tables in this and another schema, including several threads running the test described above).

I'll see if I can reproduce this in house, this should be our best 
bet to fix it.

Are you sure this is the right pstack? It doesn't seem to contain 
the failing function row_mysql_unfreeze_data_dictionary(), nor abort()
for that matter.

The bug doesn't seem to be an easy prey, although I never tried 1.0.6.
I used some reasonable approximation of the two missing table structures.
How reproducible and bothersome is it? We may want to try reproduce it
with a later 1.0.x or with 1.1 when we finally merge mysql-trunk-innodb.
Could you also provide *trx from the coredump? I doubt we can easily review
the coredump ourselves.

From the log:

100527 18:38:44 InnoDB Plugin 1.0.6 started; log sequence number 0
100527 18:38:45 [Note] Event Scheduler: Loaded 0 events
100527 18:38:45 [Note] /export/home/pb2/test/sb_1-1857013-1274974627.52/mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld: ready for connections.
Version: '5.5.5-m3-debug-log'  socket: '/export/home/pb2/test/sb_1-1857013-1274974627.52/tmp/Hp1cIQG5uW/mysqld.1.sock'  port: 10730  Source distribution
100527 18:38:47 [Note] Start binlog_dump to slave_server(2), pos(, 4)
100527 18:42:24  InnoDB: Assertion failure in thread 710 in file /export/home/pb2/build/sb_2-1854883-1274945846.32/mysql-5.5.5-m3/storage/innobase/row/row0mysql.c line 1704
InnoDB: Failing assertion: trx->dict_operation_lock_mode == RW_S_LATCH

didn't take long to fail...

mysqld.log contains CREATEs for the missing tables:

                   15 Query     CREATE TABLE test.insdel2_tracker (trid MEDIUMINT AUTO_INCREMENT,
d DATETIME, u CHAR(20),
PRIMARY KEY(trid)) ENGINE=InnoDB
                   15 Query     CREATE TABLE test.insdel2_tracker2 (trid MEDIUMINT AUTO_INCREMENT,
d DATETIME, u CHAR(20), r FLOAT,
PRIMARY KEY(trid)) ENGINE=InnoDB

there are also partitioned versions of test tables and the stack backtrace
contains some threads in partitioning routines.

trx from row_mysql_unfreeze_data_dictionary

Attachment: 54113_trx.txt (text/plain), 4.58 KiB.

dbx stack trace for thread 710

Attachment: 54113_dbx_thread710.txt (text/plain), 3.28 KiB.

Hi Michael,

>> Are you sure this is the right pstack? It doesn't seem to contain 
>> the failing function row_mysql_unfreeze_data_dictionary(), nor abort()
>> for that matter.

Yes, that's the right one, the thread #710.
It's just that pstack did not recognize everything. I've attached stack trace from dbx, it looks better.

>> How reproducible and bothersome is it? 

Happened twice in 10 days in system tests, not reproducible on purpose yet. I keep looking into it, too, no luck so far either.

>> Could you also provide *trx from the coredump? 

Attached.

>> didn't take long to fail...

4 minutes is long time for a system stress test with 100 threads, each of which picks up a random task and runs it as fast as server can handle...

>> mysqld.log contains CREATEs for the missing tables:
>> <cut>
>> there are also partitioned versions of test tables and the stack 
>> backtrace contains some threads in partitioning routines.

There are more tables and other structures, yes, as well as more test scenarios. I only described those which are involved in the query which server complained about during the crash.

observations:

trx->dict_operation_lock_mode is 0
trx has been chosen as deadlock detection victim
innodb-buffer-pool-size=8M
the code looks OK as it almost always does

Here I'm at a loss as to how this can happen so assigning to Marko
to think about it a little before he goes on vacation.
If only one thread works over the same trx, nothing bad can happen,
deadlock victim or not. If two threads work on the same trx simultaneously,
the trx->dict_operation_lock_mode thing should break immediately.
Yet we have a low reproducible bug.

The code looks suspicious when there are multiple foreign key constraints defined in the table. The variable got_s_lock is never reset to FALSE in the loop in row_ins_check_foreign_constraints(). Can you test the following patch?

=== modified file 'storage/innobase/row/row0ins.c'
--- storage/innobase/row/row0ins.c	revid:marko.makela@oracle.com-20100614065030-tebecnjv316n8l7b
+++ storage/innobase/row/row0ins.c	2010-06-16 11:19:24 +0000
@@ -1517,25 +1517,24 @@ row_ins_check_foreign_constraints(
 	que_thr_t*	thr)	/*!< in: query thread */
 {
 	dict_foreign_t*	foreign;
-	ulint		err;
 	trx_t*		trx;
-	ibool		got_s_lock	= FALSE;
 
 	trx = thr_get_trx(thr);
 
-	foreign = UT_LIST_GET_FIRST(table->foreign_list);
-
-	while (foreign) {
+	for (foreign = UT_LIST_GET_FIRST(table->foreign_list); foreign;
+	     foreign = UT_LIST_GET_NEXT(foreign_list, foreign)) {
 		if (foreign->foreign_index == index) {
+			ibool	got_s_lock;
+			ulint	err;
 
 			if (foreign->referenced_table == NULL) {
 				dict_table_get(foreign->referenced_table_name,
 					       FALSE);
 			}
 
-			if (0 == trx->dict_operation_lock_mode) {
-				got_s_lock = TRUE;
+			got_s_lock = (0 == trx->dict_operation_lock_mode);
 
+			if (got_s_lock) {
 				row_mysql_freeze_data_dictionary(trx);
 			}
 
@@ -1575,8 +1574,6 @@ row_ins_check_foreign_constraints(
 				return(err);
 			}
 		}
-
-		foreign = UT_LIST_GET_NEXT(foreign_list, foreign);
 	}
 
 	return(DB_SUCCESS);

On a closer thought, my patch at [16 Jun 13:21] should be mere code clean-up. The value of got_s_lock should not change between loop iterations; therefore it is not strictly necessary to recalculate it. The real bug must be somewhere else.

On a closer inspection of the stack traces, I see some ha_innobase:: functions and no names for the InnoDB functions. This bug was filed against the InnoDB Plugin. I can understand if the system cannot resolve symbols in the ha_innodb_plugin.so, but I do not understand why it would display names like ha_innobase::.

In the InnoDB Plugin, the ha_innobase class name is #define'd to ha_innodb. That is, the function names should display as ha_innodb::index_read(), not ha_innobase::index_read(). Could this bug be the result of a bad build somehow? The ha_innodb.cc is similar but not identical between storage/innobase and storage/innodb_plugin in MySQL 5.1.

It is 5.5 (trunk-bugfixing, probably now trunk), not 5.1.
In 5.5 we have only one InnoDB -- the plugin which leaves in innobase and acts as built-in, so I suppose the define does not work in this case -- at least other stack traces from 5.5 also show ha_innobase::. 
I categorize InnoDB bugs in 5.5 as 'InnoDB Plugin' because that's how it used to identify itself on startup ("InnoDB Plugin 1.0.6 started;" etc.); but I saw it changed recently with 1.1, so if it is more convenient for you to have bugs filed as 'InnoDB', not plugin, please just let me know.

I saw a related assert in my stress testing of the 5.1 builtin:

100705  9:46:56  InnoDB: Assertion failure in thread 352987264 in file srv/srv0srv.c line 1504
InnoDB: Failing assertion: trx->dict_operation_lock_mode == 0

unfortunately the coredump is garbled because of a misbuild. 
I'll keep on trying.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".