| Bug #38804 | Query deadlock causes all tables to be inaccessible. | ||
|---|---|---|---|
| Submitted: | 14 Aug 2008 16:27 | Modified: | 17 Oct 2008 18:16 |
| Reporter: | Philip Stoev | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Locking | Severity: | S1 (Critical) |
| Version: | 5.1-bzr | OS: | Any |
| Assigned to: | Mattias Jonsson | CPU Architecture: | Any |
[15 Aug 2008 7:43]
Sveta Smirnova
Thank you for the report. Verified as described.
[28 Aug 2008 16:01]
Mattias Jonsson
Is this repeatable in 6.0? I think that the auto_increment patch that fixed bug#33479 might solve this bug too (since it does no longer use the table_share->mutex). (that patch has only been included in 6.0, not in 5.1)
[29 Aug 2008 9:47]
Kristofer Pettersson
Bug has its origin from: http://bugs.mysql.com/bug.php?id=27405
[1 Sep 2008 14:15]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/53007 2728 Kristofer Pettersson 2008-09-01 Bug#38804 Query deadlock causes all tables to be inaccessible. Backport 6.0-> 5.1 of b ug 33479: auto_increment failures in partitioning Several problems with auto_increment in partitioning (with MyISAM, InnoDB and Falcon. Locking issues, not handling multi-row INSERTs properly etc.) This is a full patch which I have updated according to sergs and guilhems comments. Changed the auto_increment handling for partitioning: Added a ha_data variable in table_share for storage engine specific data such as auto_increment value handling in partitioning, also see WL 4305 and using the ha_data->mutex to lock around read + update. The idea is this: Store the table's reserved auto_increment value in the TABLE_SHARE and use a mutex to, lock it for reading and updating it and unlocking it, in one block. Only accessing all partitions when it is not initialized. Also allow reservations of ranges, and if no one has done a reservation afterwards, lower the reservation to what was actually used after the statement is done (via release_auto_increment from WL 3146). The lock is kept from the first reservation if it is statement based replication and a multi-row INSERT statement where the number of candidate rows to insert is not known in advance (like INSERT SELECT, LOAD DATA, unlike INSERT VALUES (row1), (row2),,(rowN)). This should also lead to better concurrancy (no need to have a mutex protection around write_row in all cases) and work with any local storage engine.
[2 Sep 2008 13:29]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/53061 2732 Mattias Jonsson 2008-09-02 Bug#38804: Query deadlock causes all tables to be inaccessible. Problem was a mutex added in bug n 27405 for solving a problem with auto_increment in partitioned innodb tables. (in ha_partition::write_row over partitions file->ha_write_row) Solution is to use the patch for bug#33479, which refines the usage of mutexes for auto_increment. Backport of bug-33479 from 6.0: Bug-33479: auto_increment failures in partitioning Several problems with auto_increment in partitioning (with MyISAM, InnoDB. Locking issues, not handling multi-row INSERTs properly etc.) Changed the auto_increment handling for partitioning: Added a ha_data variable in table_share for storage engine specific data such as auto_increment value handling in partitioning, also see WL 4305 and using the ha_data->mutex to lock around read + update. The idea is this: Store the table's reserved auto_increment value in the TABLE_SHARE and use a mutex to, lock it for reading and updating it and unlocking it, in one block. Only accessing all partitions when it is not initialized. Also allow reservations of ranges, and if no one has done a reservation afterwards, lower the reservation to what was actually used after the statement is done (via release_auto_increment from WL 3146). The lock is kept from the first reservation if it is statement based replication and a multi-row INSERT statement where the number of candidate rows to insert is not known in advance (like INSERT SELECT, LOAD DATA, unlike INSERT VALUES (row1), (row2),,(rowN)). This should also lead to better concurrancy (no need to have a mutex protection around write_row in all cases) and work with any local storage engine.
[3 Sep 2008 8:22]
Kristofer Pettersson
Dead lock visualized
Attachment: bug38804.swf (application/x-shockwave-flash, text), 64.78 KiB.
[3 Sep 2008 21:06]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/53206 2733 Mattias Jonsson 2008-09-03 Bug#38804: Query deadlock causes all tables to be inaccessible. Bug-33479: auto_increment failures in partitioning fix for update of auto_increment column (Same bug exists in non partitioned InnoDB, bug-39207)
[8 Sep 2008 13:30]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/53529 2732 Mattias Jonsson 2008-09-08 Bug#38804: Query deadlock causes all tables to be inaccessible. Problem was a mutex added in bug n 27405 for solving a problem with auto_increment in partitioned innodb tables. (in ha_partition::write_row over partitions file->ha_write_row) Solution is to use the patch for bug#33479, which refines the usage of mutexes for auto_increment. Backport of bug-33479 from 6.0: Bug-33479: auto_increment failures in partitioning Several problems with auto_increment in partitioning (with MyISAM, InnoDB. Locking issues, not handling multi-row INSERTs properly etc.) Changed the auto_increment handling for partitioning: Added a ha_data variable in table_share for storage engine specific data such as auto_increment value handling in partitioning, also see WL 4305 and using the ha_data->mutex to lock around read + update. The idea is this: Store the table's reserved auto_increment value in the TABLE_SHARE and use a mutex to, lock it for reading and updating it and unlocking it, in one block. Only accessing all partitions when it is not initialized. Also allow reservations of ranges, and if no one has done a reservation afterwards, lower the reservation to what was actually used after the statement is done (via release_auto_increment from WL 3146). The lock is kept from the first reservation if it is statement based replication and a multi-row INSERT statement where the number of candidate rows to insert is not known in advance (like INSERT SELECT, LOAD DATA, unlike INSERT VALUES (row1), (row2),,(rowN)). This should also lead to better concurrancy (no need to have a mutex protection around write_row in all cases) and work with any local storage engine.
[29 Sep 2008 9:06]
Mattias Jonsson
Estimated push to 5.1.29 is 2008-10-03.
[1 Oct 2008 10:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/54916 2694 Mattias Jonsson 2008-10-01 pre push fix for bug#38804
[1 Oct 2008 12:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/54920 2844 Mattias Jonsson 2008-10-01 [merge] merge (basically a null merge, since it contains a backport of bug#33479 as a fix for Bug#38804)
[1 Oct 2008 19:28]
Mattias Jonsson
pushed into mysql-6.0-5.1.29-rc and mysql-5.1-5.1.29-rc
[3 Oct 2008 9:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/55199 2686 Mattias Jonsson 2008-10-03 post push fix for bug#38804 (back port of bug#33479) Removes the regression bug#38751.
[3 Oct 2008 12:35]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/55246 2841 Mattias Jonsson 2008-10-03 [merge] merge and update of post push fix for bug#38804 There had also been a small miss in the previous merge between the 5.1.29-rc and -bugteam trees
[7 Oct 2008 19:21]
Paul DuBois
Noted in 5.1.29 changelog. For deadlock between two transactions that required a timeout to resolve, all server tables became inaccessible for the duration of the deadlock. Leaving report status unchanged; this is early documentation of an upcoming push into 5.1.29.
[9 Oct 2008 17:51]
Bugs System
Pushed into 5.1.30 (revid:mattias.jonsson@sun.com-20081003093054-0sc082k21jboj166) (version source revid:kgeorge@mysql.com-20081007082452-gk4l86zq8k53wwyo) (pib:4)
[9 Oct 2008 18:39]
Paul DuBois
Setting report to NDI pending push into 6.0.x.
[17 Oct 2008 16:46]
Bugs System
Pushed into 6.0.8-alpha (revid:mattias.jonsson@sun.com-20081008085925-xh27vrkzh2gx1k3u) (version source revid:kgeorge@mysql.com-20081008092313-4wzrak2duhsz7ylm) (pib:5)
[17 Oct 2008 18:16]
Paul DuBois
Noted in 6.0.8 changelog.
[28 Oct 2008 21:01]
Bugs System
Pushed into 5.1.29-ndb-6.2.17 (revid:mattias.jonsson@sun.com-20081003093054-0sc082k21jboj166) (version source revid:tomas.ulin@sun.com-20081028140209-u4emkk1xphi5tkfb) (pib:5)
[28 Oct 2008 22:20]
Bugs System
Pushed into 5.1.29-ndb-6.3.19 (revid:mattias.jonsson@sun.com-20081003093054-0sc082k21jboj166) (version source revid:tomas.ulin@sun.com-20081028194045-0353yg8cvd2c7dd1) (pib:5)
[1 Nov 2008 9:44]
Bugs System
Pushed into 5.1.29-ndb-6.4.0 (revid:mattias.jonsson@sun.com-20081003093054-0sc082k21jboj166) (version source revid:jonas@mysql.com-20081101082305-qx5a1bj0z7i8ueys) (pib:5)

Description: If there is a deadlock between two transactions that requires a timeout to resolve, then all tables on the server are not accessible for the duration of the deadlock. Threads which have nothing to do with the deadlock are hung like this: #1 0x00583829 in __lll_lock_wait () from /lib/libpthread.so.0 #2 0x0057edb5 in _L_lock_758 () from /lib/libpthread.so.0 #3 0x0057ecce in pthread_mutex_lock () from /lib/libpthread.so.0 #4 0x08589e61 in safe_mutex_lock (mp=0xa380804, try_lock=0 '\0', file=0x86ab82d "sql_base.cc", line=396) at thr_mutex.c:149 #5 0x082a5ba8 in get_table_share (thd=0xa37c9d8, table_list=0xb2b4a58c, key=0xb2b4a76e "test", key_length=48, db_flags=8192, error=0xb2b4a8f8) at sql_base.cc:396 #6 0x083a6377 in fill_schema_table_from_frm (thd=0xa37c9d8, table=0xb28085c0, schema_table=0x8743938, db_name=0xb2b4aba4, table_name=0xa356738, schema_table_idx=SCH_COLUMNS) at sql_show.cc:3028 #7 0x083b62bc in get_all_tables (thd=0xa37c9d8, tables=0xa354db8, cond=0xa3550b8) at sql_show.cc:3257 #8 0x083a32eb in get_schema_tables_result (join=0xb280a8c0, executed_place=PROCESSED_BY_JOIN_EXEC) at sql_show.cc:5911 #9 0x082e00ec in JOIN::exec (this=0xb280a8c0) at sql_select.cc:1712 #10 0x082dccc5 in mysql_select (thd=0xa37c9d8, rref_pointer_array=0xa37de5c, tables=0xa354db8, wild_num=0, fields=@0xa37ddf8, conds=0xa3550b8, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2686208513, result=0xa355f70, unit=0xa37daec, select_lex=0xa37dd64) at sql_select.cc:2360 #11 0x082e202c in handle_select (thd=0xa37c9d8, lex=0xa37da90, result=0xa355f70, setup_tables_done_option=0) at sql_select.cc:269 #12 0x0825997f in execute_sqlcom_select (thd=0xa37c9d8, all_tables=0xa354db8) at sql_parse.cc:4765 #13 0x0825b655 in mysql_execute_command (thd=0xa37c9d8) at sql_parse.cc:2073 #14 0x08264375 in mysql_parse (thd=0xa37c9d8, inBuf=0xa354bc8 "SELECT DISTINCT COLUMN_NAME\n\t\tFROM INFORMATION_SCHEMA.COLUMNS\n\t\tWHERE TABLE_SCHEMA = DATABASE()", length=95, found_semicolon=0xb2b4c25c) at sql_parse.cc:5656 #15 0x08264f70 in dispatch_command (command=COM_QUERY, thd=0xa37c9d8, packet=0xa35ec31 "\n\t\tSELECT DISTINCT COLUMN_NAME\n\t\tFROM INFORMATION_SCHEMA.COLUMNS\n\t\tWHERE TABLE_SCHEMA = DATABASE()\n\t", packet_length=100) at sql_parse.cc:1137 #16 0x0826612e in do_command (thd=0xa37c9d8) at sql_parse.cc:794 #17 0x082535ae in handle_one_connection (arg=0xa37c9d8) at sql_connect.cc:1115 #18 0x0057d32f in start_thread () from /lib/libpthread.so.0 #19 0x0049a27e in clone () from /lib/libc.so.6 or #0 0x00110416 in __kernel_vsyscall () #1 0x00583829 in __lll_lock_wait () from /lib/libpthread.so.0 #2 0x0057ed75 in _L_lock_87 () from /lib/libpthread.so.0 #3 0x0057e7fa in pthread_mutex_lock () from /lib/libpthread.so.0 #4 0x08589e61 in safe_mutex_lock (mp=0x884ae40, try_lock=0 '\0', file=0x86ab82d "sql_base.cc", line=2702) at thr_mutex.c:149 #5 0x082a8787 in open_table (thd=0xa39f530, table_list=0xa43e760, mem_root=0xb2960ddc, refresh=0xb2960e1f, flags=0) at sql_base.cc:2702 #6 0x082a9cc3 in open_tables (thd=0xa39f530, start=0xb2960e94, counter=0xb2960e80, flags=0) at sql_base.cc:4515 #7 0x082aa4b4 in open_and_lock_tables_derived (thd=0xa39f530, tables=0xa43e760, derived=true) at sql_base.cc:4915 #8 0x08266e2b in open_and_lock_tables (thd=0xa39f530, tables=0xa43e760) at mysql_priv.h:1531 #9 0x08259702 in execute_sqlcom_select (thd=0xa39f530, all_tables=0xa43e760) at sql_parse.cc:4730 #10 0x0825b655 in mysql_execute_command (thd=0xa39f530) at sql_parse.cc:2073 #11 0x08264375 in mysql_parse (thd=0xa39f530, inBuf=0xa43e638 "select * from a", length=15, found_semicolon=0xb296225c) at sql_parse.cc:5656 #12 0x08264f70 in dispatch_command (command=COM_QUERY, thd=0xa39f530, packet=0xa4375f1 "select * from a", packet_length=15) at sql_parse.cc:1137 #13 0x0826612e in do_command (thd=0xa39f530) at sql_parse.cc:794 #14 0x082535ae in handle_one_connection (arg=0xa39f530) at sql_connect.cc:1115 #15 0x0057d32f in start_thread () from /lib/libpthread.so.0 #16 0x0049a27e in clone () from /lib/libc.so.6 How to repeat: 1. Trigger a deadlock that is resolved only via timeout, and not with a deadlock detector. This would require using both a transactional storage engine (can cause deadlocks) and partitioning (causes deadlocks that would otherwise be resolved with a detector to be resolved via timeout). Such an example would be the test case from bug #38797. Use --innodb-lock-wait-timeout=500 for comfort. 2. Using SHOW PROCESSLIST observe that threads executing, for example, an I_S SELECT query, hang with status "checking permissions". 3. Using a mysql client, observe that queries like "SELECT * FROM foo" hang in open_table().