Bug #54113 | Assertion: trx->dict_operation_lock_mode == RW_S_LATCH in row0mysql.c line 1704 | ||
---|---|---|---|
Submitted: | 31 May 2010 17:05 | Modified: | 5 Aug 2010 12:49 |
Reporter: | Elena Stepanova | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S2 (Serious) |
Version: | mysql-trunk-bugfixing | OS: | Any |
Assigned to: | Marko Mäkelä | CPU Architecture: | Any |
[31 May 2010 17:05]
Elena Stepanova
[31 May 2010 17:16]
Elena Stepanova
pstack, all threads
Attachment: bug54113_pstack.out (application/octet-stream, text), 140.40 KiB.
[31 May 2010 17:45]
Elena Stepanova
Some notes on the test flow. In error log, server complains about this statement: thd->query at 948a390 = INSERT INTO test.insdel2_sub VALUES (NULL,NAME_CONST('ins_count',99), 'More text to test with') This statement is executed as a part of a stored procedure, below is its definition: CREATE PROCEDURE test.insdel2() BEGIN DECLARE ins_count INT DEFAULT 100; DECLARE del_count INT; DECLARE cur_user VARCHAR(255); DECLARE local_uuid VARCHAR(255); DECLARE local_time TIMESTAMP; SET local_time= NOW(); SET cur_user = CURRENT_USER(); SET local_uuid=UUID(); WHILE ins_count > 0 DO INSERT INTO test.insdel2_tbl VALUES (NULL, NOW(), USER(), UUID(), ins_count,'Going to test MBR for MySQL'); INSERT INTO test.insdel2_sub VALUES (NULL,ins_count, 'More text to test with'); SET ins_count = ins_count - 1; END WHILE; SELECT MAX(id) FROM test.insdel2_tbl INTO del_count; WHILE del_count > 0 DO DELETE FROM test.insdel2_tbl WHERE id = del_count; SET del_count = del_count - 2; END WHILE; END Involved tables: CREATE TABLE test.insdel2_tbl(id MEDIUMINT NOT NULL AUTO_INCREMENT, dt TIMESTAMP, user CHAR(255), uuidf LONGBLOB, fkid MEDIUMINT, filler VARCHAR(255), PRIMARY KEY(id))ENGINE=innodb CREATE TABLE test.insdel2_sub (tid MEDIUMINT AUTO_INCREMENT,fkid MEDIUMINT, filler VARCHAR(255), FOREIGN KEY(tid) REFERENCES test.insdel2_tbl(id) ON DELETE CASCADE, PRIMARY KEY(tid))ENGINE=innodb There are also triggers on the tables: CREATE TRIGGER test.insdel2_bd BEFORE DELETE ON test.insdel2_tbl FOR EACH ROW BEGIN INSERT INTO test.insdel2_tracker VALUES (NULL, NOW(), USER()); END CREATE TRIGGER test.insdel2_bi BEFORE INSERT ON test.insdel2_sub FOR EACH ROW BEGIN INSERT INTO test.insdel2_tracker2 VALUES (NULL, NOW(), USER(), RAND()); END The procedure is called by a test in transactional context, nothing happens in the transaction before the call. START TRANSACTION; CALL test.insdel2(); The test was run in concurrent mode (up to 100 threads running different test flow, working with different tables in this and another schema, including several threads running the test described above).
[1 Jun 2010 15:06]
Mikhail Izioumtchenko
I'll see if I can reproduce this in house, this should be our best bet to fix it.
[8 Jun 2010 17:24]
Mikhail Izioumtchenko
Are you sure this is the right pstack? It doesn't seem to contain the failing function row_mysql_unfreeze_data_dictionary(), nor abort() for that matter. The bug doesn't seem to be an easy prey, although I never tried 1.0.6. I used some reasonable approximation of the two missing table structures. How reproducible and bothersome is it? We may want to try reproduce it with a later 1.0.x or with 1.1 when we finally merge mysql-trunk-innodb. Could you also provide *trx from the coredump? I doubt we can easily review the coredump ourselves. From the log: 100527 18:38:44 InnoDB Plugin 1.0.6 started; log sequence number 0 100527 18:38:45 [Note] Event Scheduler: Loaded 0 events 100527 18:38:45 [Note] /export/home/pb2/test/sb_1-1857013-1274974627.52/mysql-5.5.5-m3-solaris10-x86_64-test/bin/mysqld: ready for connections. Version: '5.5.5-m3-debug-log' socket: '/export/home/pb2/test/sb_1-1857013-1274974627.52/tmp/Hp1cIQG5uW/mysqld.1.sock' port: 10730 Source distribution 100527 18:38:47 [Note] Start binlog_dump to slave_server(2), pos(, 4) 100527 18:42:24 InnoDB: Assertion failure in thread 710 in file /export/home/pb2/build/sb_2-1854883-1274945846.32/mysql-5.5.5-m3/storage/innobase/row/row0mysql.c line 1704 InnoDB: Failing assertion: trx->dict_operation_lock_mode == RW_S_LATCH didn't take long to fail... mysqld.log contains CREATEs for the missing tables: 15 Query CREATE TABLE test.insdel2_tracker (trid MEDIUMINT AUTO_INCREMENT, d DATETIME, u CHAR(20), PRIMARY KEY(trid)) ENGINE=InnoDB 15 Query CREATE TABLE test.insdel2_tracker2 (trid MEDIUMINT AUTO_INCREMENT, d DATETIME, u CHAR(20), r FLOAT, PRIMARY KEY(trid)) ENGINE=InnoDB there are also partitioned versions of test tables and the stack backtrace contains some threads in partitioning routines.
[8 Jun 2010 18:23]
Elena Stepanova
trx from row_mysql_unfreeze_data_dictionary
Attachment: 54113_trx.txt (text/plain), 4.58 KiB.
[8 Jun 2010 18:23]
Elena Stepanova
dbx stack trace for thread 710
Attachment: 54113_dbx_thread710.txt (text/plain), 3.28 KiB.
[8 Jun 2010 18:28]
Elena Stepanova
Hi Michael, >> Are you sure this is the right pstack? It doesn't seem to contain >> the failing function row_mysql_unfreeze_data_dictionary(), nor abort() >> for that matter. Yes, that's the right one, the thread #710. It's just that pstack did not recognize everything. I've attached stack trace from dbx, it looks better. >> How reproducible and bothersome is it? Happened twice in 10 days in system tests, not reproducible on purpose yet. I keep looking into it, too, no luck so far either. >> Could you also provide *trx from the coredump? Attached. >> didn't take long to fail... 4 minutes is long time for a system stress test with 100 threads, each of which picks up a random task and runs it as fast as server can handle... >> mysqld.log contains CREATEs for the missing tables: >> <cut> >> there are also partitioned versions of test tables and the stack >> backtrace contains some threads in partitioning routines. There are more tables and other structures, yes, as well as more test scenarios. I only described those which are involved in the query which server complained about during the crash.
[8 Jun 2010 22:14]
Mikhail Izioumtchenko
observations: trx->dict_operation_lock_mode is 0 trx has been chosen as deadlock detection victim innodb-buffer-pool-size=8M the code looks OK as it almost always does
[10 Jun 2010 19:23]
Mikhail Izioumtchenko
Here I'm at a loss as to how this can happen so assigning to Marko to think about it a little before he goes on vacation. If only one thread works over the same trx, nothing bad can happen, deadlock victim or not. If two threads work on the same trx simultaneously, the trx->dict_operation_lock_mode thing should break immediately. Yet we have a low reproducible bug.
[16 Jun 2010 11:21]
Marko Mäkelä
The code looks suspicious when there are multiple foreign key constraints defined in the table. The variable got_s_lock is never reset to FALSE in the loop in row_ins_check_foreign_constraints(). Can you test the following patch? === modified file 'storage/innobase/row/row0ins.c' --- storage/innobase/row/row0ins.c revid:marko.makela@oracle.com-20100614065030-tebecnjv316n8l7b +++ storage/innobase/row/row0ins.c 2010-06-16 11:19:24 +0000 @@ -1517,25 +1517,24 @@ row_ins_check_foreign_constraints( que_thr_t* thr) /*!< in: query thread */ { dict_foreign_t* foreign; - ulint err; trx_t* trx; - ibool got_s_lock = FALSE; trx = thr_get_trx(thr); - foreign = UT_LIST_GET_FIRST(table->foreign_list); - - while (foreign) { + for (foreign = UT_LIST_GET_FIRST(table->foreign_list); foreign; + foreign = UT_LIST_GET_NEXT(foreign_list, foreign)) { if (foreign->foreign_index == index) { + ibool got_s_lock; + ulint err; if (foreign->referenced_table == NULL) { dict_table_get(foreign->referenced_table_name, FALSE); } - if (0 == trx->dict_operation_lock_mode) { - got_s_lock = TRUE; + got_s_lock = (0 == trx->dict_operation_lock_mode); + if (got_s_lock) { row_mysql_freeze_data_dictionary(trx); } @@ -1575,8 +1574,6 @@ row_ins_check_foreign_constraints( return(err); } } - - foreign = UT_LIST_GET_NEXT(foreign_list, foreign); } return(DB_SUCCESS);
[22 Jun 2010 13:13]
Marko Mäkelä
On a closer thought, my patch at [16 Jun 13:21] should be mere code clean-up. The value of got_s_lock should not change between loop iterations; therefore it is not strictly necessary to recalculate it. The real bug must be somewhere else.
[22 Jun 2010 13:18]
Marko Mäkelä
On a closer inspection of the stack traces, I see some ha_innobase:: functions and no names for the InnoDB functions. This bug was filed against the InnoDB Plugin. I can understand if the system cannot resolve symbols in the ha_innodb_plugin.so, but I do not understand why it would display names like ha_innobase::. In the InnoDB Plugin, the ha_innobase class name is #define'd to ha_innodb. That is, the function names should display as ha_innodb::index_read(), not ha_innobase::index_read(). Could this bug be the result of a bad build somehow? The ha_innodb.cc is similar but not identical between storage/innobase and storage/innodb_plugin in MySQL 5.1.
[22 Jun 2010 13:45]
Elena Stepanova
It is 5.5 (trunk-bugfixing, probably now trunk), not 5.1. In 5.5 we have only one InnoDB -- the plugin which leaves in innobase and acts as built-in, so I suppose the define does not work in this case -- at least other stack traces from 5.5 also show ha_innobase::. I categorize InnoDB bugs in 5.5 as 'InnoDB Plugin' because that's how it used to identify itself on startup ("InnoDB Plugin 1.0.6 started;" etc.); but I saw it changed recently with 1.1, so if it is more convenient for you to have bugs filed as 'InnoDB', not plugin, please just let me know.
[5 Jul 2010 19:19]
Mikhail Izioumtchenko
I saw a related assert in my stress testing of the 5.1 builtin: 100705 9:46:56 InnoDB: Assertion failure in thread 352987264 in file srv/srv0srv.c line 1504 InnoDB: Failing assertion: trx->dict_operation_lock_mode == 0 unfortunately the coredump is garbled because of a misbuild. I'll keep on trying.
[22 Jul 2010 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".