Bug #37346 | innodb does not detect deadlock between update and alter table | ||
---|---|---|---|
Submitted: | 11 Jun 2008 14:35 | Modified: | 7 Mar 2010 18:34 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S3 (Non-critical) |
Version: | 5.0,5.1 | OS: | Any |
Assigned to: | Dmitry Lenev | CPU Architecture: | Any |
[11 Jun 2008 14:35]
Shane Bester
[11 Jun 2008 14:37]
MySQL Verification Team
output of lock monitor, processlist, original testcase output showing 0 tps
Attachment: bug37346_innodb_lock_monitor.txt (text/plain), 12.71 KiB.
[11 Jun 2008 14:57]
MySQL Verification Team
testcase. run against mysqld with long innodb_lock_wait_timeout until queries hang
Attachment: bug37346.c (text/plain), 6.70 KiB.
[11 Jun 2008 15:00]
MySQL Verification Team
thread stacks of 2x update and 1x alter table
Attachment: bug37346_thread_stacks_during_hang.txt (text/plain), 6.83 KiB.
[11 Jun 2008 15:37]
Heikki Tuuri
Shane, this problem is a general one in the MySQL server. MySQL is not aware of lock waits inside storage engines, and storage engines are not aware of MySQL table lock waits. Currently, a lock wait timeout is the only way to resolve these deadlocks that involve a MySQL table lock. Setting this as a general MySQL Server bug. In the general form, this is hard to fix, because that requires MySQL to keep a global waits-for graph, also for storage engines. Regards, Heikki
[11 Jun 2008 15:49]
MySQL Verification Team
Heikki, I suggest that this bug is fixed in 6.0. In 6.0 we could use transactional table locks and fix issues like these efficiently. What do you think ???
[15 Jul 2008 2:53]
Sean Pringle
The initial test case can be reduced by manually interleaving the 2x updates in txn1 with the 1x alter in txn2. Open two MySQL client sessions... In session #1: create table t1 (c1 int primary key, c2 int, c3 int) engine=innodb; insert into t1 values (1,1,0),(2,2,0),(3,3,0),(4,4,0),(5,5,0); set autocommit=0; update t1 set c3=c3+1 where c2=3; In session #2: alter table t1 engine=innodb; (waits) In session #1: update t1 set c3=c3+1 where c2=4; (waits) Observe both sessions deadlocked for innodb_lock_wait_timeout seconds, then in session #2: ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction In session #1: Query OK..etc commit;
[2 Oct 2008 21:19]
Konstantin Osipov
Duplicate of Bug#989
[14 Jul 2009 17:13]
MySQL Verification Team
Sean`s test, repeatable in Windows Vista 64-bit: mysql> alter table t1 engine=innodb; ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction mysql>
[17 Jul 2009 9:31]
Konstantin Osipov
This should have been fixed by Bug#989 but wasn't due to an omission. Please triage for sr54, it's part of WL#4284 specification that this sort of deadlock goes away.
[17 Jul 2009 10:01]
Konstantin Osipov
Hm... not sure I have a plan for a quick fix in 5.4...
[17 Jul 2009 10:05]
Konstantin Osipov
SHOW PROCESSLIST output: 1 | localhost:60065 | test | Table lock | update t1 set c3=c3+1 where c2=4 2 | localhost:60068 | test | copy to tmp table | alter table t1 engine=innodb ALTER is waiting to upgrade the metadata lock, update is waiting since the table is locked by ALTER with TL_WRITE_ALLOW_READ data lock (which is incompatible with TL_WRITE_ALLOW_WRITE requested by update).
[17 Jul 2009 10:09]
Konstantin Osipov
Update thread: (gdb) bt #0 0xb7f3e430 in __kernel_vsyscall () #1 0xb7f190e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0 #2 0x08740704 in safe_cond_wait (cond=0xb116bf4, mp=0xb1ff054, file=0x899fc6b "thr_lock.c", line=470) at thr_mutex.c:423 #3 0x0874c965 in wait_for_lock (wait=0xb1ff0c4, data=0xb2463f4, in_wait_list=0 '\0') at thr_lock.c:470 #4 0x0874d49b in thr_lock (data=0xb2463f4, owner=0xb157038, lock_type=TL_WRITE_ALLOW_WRITE) at thr_lock.c:762 #5 0x0874dec5 in thr_multi_lock (data=0xb25c3bc, count=1, owner=0xb157038) at thr_lock.c:1020 #6 0x082e8ba1 in mysql_lock_tables (thd=0xb156808, tables=0xb25c3a0, count=1, flags=0, need_reopen=0xb47d4ac0) at lock.cc:321 #7 0x08359282 in lock_tables (thd=0xb156808, tables=0xb2452a0, count=1, flags=0, need_reopen=0xb47d4ac0) at sql_base.cc:4541 #8 0x083c5996 in mysql_update (thd=0xb156808, table_list=0xb2452a0, fields=@0xb157aec, values=@0xb157d58, conds=0xb245ae0, order_num=0, order=0x0, limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false) at sql_update.cc:227 #9 0x083070a9 in mysql_execute_command (thd=0xb156808) at sql_parse.cc:3099 #10 0x0830da03 in mysql_parse (thd=0xb156808, inBuf=0xb2451d8 "update t1 set c3=c3+1 where c2=4", length=32, found_semicolon=0xb47d5bd4) at sql_parse.cc:5942 #11 0x0830e55d in dispatch_command (command=COM_QUERY, thd=0xb156808, packet=0xb1e4b99 "update t1 set c3=c3+1 where c2=4", packet_length=32) at sql_parse.cc:1061 #12 0x0830fa73 in do_command (thd=0xb156808) at sql_parse.cc:743 #13 0x082fc018 in handle_one_connection (arg=0xb156808) at sql_connect.cc:1158 #14 0xb7f154ff in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #15 0xb7d1b49e in clone () from /lib/tls/i686/cmov/libc.so.6
[17 Jul 2009 10:10]
Konstantin Osipov
Alter thread: #0 0xb7f3e430 in __kernel_vsyscall () #1 0xb7f190e5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/i686/cmov/libpthread.so.0 #2 0x08740704 in safe_cond_wait (cond=0xacff8bc, mp=0xacff850, file=0x896f9ee "os/os0sync.c", line=422) at thr_mutex.c:423 #3 0x085e4c44 in os_event_wait_low (event=0xacff850, reset_sig_count=0) at os/os0sync.c:422 #4 0x08609a96 in srv_suspend_mysql_thread (thr=0xb69a88e8) at srv/srv0srv.c:1489 #5 0x085f632e in row_mysql_handle_errors (new_err=0xb47a2644, trx=0xb69a9c68, thr=0xb69a88e8, savept=0x0) at row/row0mysql.c:496 #6 0x08603cdd in row_search_for_mysql (buf=0xb1fee30 "ÿ", mode=1, prebuilt=0xb69a6c68, match_mode=0, direction=0) at row/row0sel.c:4419 #7 0x08598ed4 in ha_innobase::index_read (this=0xb1fec68, buf=0xb1fee30 "ÿ", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY) at handler/ha_innodb.cc:4493 #8 0x0859127c in ha_innobase::index_first (this=0xb1fec68, buf=0xb1fee30 "ÿ") at handler/ha_innodb.cc:4757 #9 0x08598bc7 in ha_innobase::rnd_next (this=0xb1fec68, buf=0xb1fee30 "ÿ") at handler/ha_innodb.cc:4854 #10 0x0843e996 in rr_sequential (info=0xb47a2c68) at records.cc:390 #11 0x08464f50 in copy_data_between_tables (from=0xb1fe5b0, to=0xb23e808, create=@0xb47a416c, ignore=false, order_num=0, order=0x0, copied=0xb47a32d8, deleted=0xb47a32d0, keys_onoff=LEAVE_AS_IS, error_if_not_empty=false) at sql_table.cc:7892 #12 0x08472d11 in mysql_alter_table (thd=0xb224700, new_db=0xb1b2d98 "test", new_name=0xb117278 "t1", create_info=0xb47a3e44, table_list=0xb1f9be0, alter_info=0xb47a413c, order_num=0, order=0x0, ignore=false) at sql_table.cc:7515 #13 0x083063f2 in mysql_execute_command (thd=0xb224700) at sql_parse.cc:2871 #14 0x0830da03 in mysql_parse (thd=0xb224700, inBuf=0xb2211b0 "alter table t1 engine=innodb", length=28, found_semicolon=0xb47a4bd4) at sql_parse.cc:5942 #15 0x0830e55d in dispatch_command (command=COM_QUERY, thd=0xb224700, packet=0xb22a139 "alter table t1 engine=innodb", packet_length=28) at sql_parse.cc:1061 #16 0x0830fa73 in do_command (thd=0xb224700) at sql_parse.cc:743 #17 0x082fc018 in handle_one_connection (arg=0xb224700) at sql_connect.cc:1158 #18 0xb7f154ff in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #19 0xb7d1b49e in clone () from /lib/tls/i686/cmov/libc.so.6 BTW, it times out after a while.
[17 Jul 2009 15:06]
Konstantin Osipov
Sorry for initial confusion. This report has 3 different manifestations, and it got me confused. Let me try to explain with 3 examples: 1) Unnecessary deadlock error between SELECT and ALTER, all engines. connection1: mysql> create table t1 (c1 int primary key, c2 int, c3 int) engine=innodb; Query OK, 0 rows affected (0.09 sec) mysql> insert into t1 values (1,1,0),(2,2,0),(3,3,0),(4,4,0),(5,5,0); Query OK, 5 rows affected (0.04 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> set autocommit=0; Query OK, 0 rows affected (0.00 sec) mysql> begin; Query OK, 0 rows affected (0.00 sec) mysql> select * from t1 where c2=3; +----+------+------+ | c1 | c2 | c3 | +----+------+------+ | 3 | 3 | 0 | +----+------+------+ 1 row in set (0.00 sec) connection 2: mysql> alter table t1 engine=innodb; (waits) connection 1: mysql> select * from t1 where c2=4; ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction The problem here is that connection 1' attempts to proceed with t1 fails. This contradicts WL#4284 specification. What should happen: connection 1 should be able to proceed. connection 2 should wait till connection 1 commits. Note, that this behaviour is engine-agnostic, identical error occurs with MyISAM engine.
[17 Jul 2009 15:07]
Konstantin Osipov
The effort to fix this problem (case 1) is medium. Risk is low.
[17 Jul 2009 17:55]
Konstantin Osipov
The described case has now been reported separately as Bug#46272 MySQL 5.4.4, new MDL: unnecessary deadlock
[20 Jul 2009 9:52]
Konstantin Osipov
Case 2), or, problem 2, only becomes visible when case 1) issue is eliminated/fixed. It is a deadlock between UPDATE and ALTER connections, using MyISAM engine. I will use a patched version of the server, with a crude fix for Bug#46272, to demonstrate the problem. The diff against the latest 5.4: === modified file 'sql/sql_base.cc' --- sql/sql_base.cc 2009-07-06 08:38:21 +0000 +++ sql/sql_base.cc 2009-07-20 09:36:14 +0000 @@ -2078,12 +2078,6 @@ bool wait_while_table_is_used(THD *thd, table->db_stat, table->s->version)); /* Ensure no one can reopen table before it's removed */ - pthread_mutex_lock(&LOCK_open); - table->s->version= 0; - pthread_mutex_unlock(&LOCK_open); - - old_lock_type= table->reginfo.lock_type; - mysql_lock_abort(thd, table, TRUE); /* end threads waiting on lock */ if (table->mdl_ticket->upgrade_shared_lock_to_exclusive()) { @@ -2092,6 +2086,15 @@ bool wait_while_table_is_used(THD *thd, } pthread_mutex_lock(&LOCK_open); + table->s->version= 0; + pthread_mutex_unlock(&LOCK_open); + + old_lock_type= table->reginfo.lock_type; +#if 0 + mysql_lock_abort(thd, table, TRUE); /* end threads waiting on lock */ +#endif + + pthread_mutex_lock(&LOCK_open); tdc_remove_table(thd, TDC_RT_REMOVE_NOT_OWN, table->s->db.str, table->s->table_name.str); pthread_mutex_unlock(&LOCK_open); The test case: -- connection 1: mysql> create table t1 (c1 int primary key, c2 int, c3 int) engine=myisam; Query OK, 0 rows affected (0.12 sec) mysql> insert into t1 values (1,1,0),(2,2,0),(3,3,0),(4,4,0),(5,5,0); Query OK, 5 rows affected (0.01 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> set autocommit=0; Query OK, 0 rows affected (0.00 sec) mysql> begin; Query OK, 0 rows affected (0.00 sec) mysql> update t1 set c3=c3+1 where c2=3; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> update t1 set c3=c3+1 where c2=3; Query OK, 1 row affected (6.14 sec) Rows matched: 1 Changed: 1 Warnings: 0 -- connection 2: mysql> alter table t1 engine=myisam; -- (waits) -- connection 1: mysql> update t1 set c3=c3+1 where c2=4; -- (waits) Voila, deadlock. In this situation connection 2 (ALTER) is waiting on connection 1 to give away its metadata lock on t1. Whereas connection 2 (UPDATE) is waiting on table-level data lock on t1, since ALTER holds TL_WRITE_ALLOW_READ, which conflicts with TL_WRITE needed for connection 1. This is a deadlock that will never time out, since there are no timeouts currently in MDL or thr_lock locks. Introducing MDL timeouts is subject of Bug#45225. But a solution with DDL timeout is a partial one -- during waiting on DDL to timeout the UPDATE transaction could very well commit, which is prevented by ALTER holding a WRITE data lock on the table. So a better solution is, for ALTER, to not block the UPDATE transaction and let it finish, while preventing new UPDATE transactions from acquiring the lock on t1. This is the issue reported here, in this report (Bug#37346). Note, that the only way I see to fix it is to extend metadata locking subsystem to become more aware of data locks, such as MDL_READ_METADATA_UPDATE_DATA and MDL_READ_METADATA_READ_DATA, so that MDL subsystem can wait on an upgradable shared metadata lock without holding connections that are doing UPDATEs. It's a major effort and a major extension of MDL functionality (two man-months of development time).
[21 Jul 2009 9:07]
Andrii Nikitin
When this bug happens on PARTITIONED table it leaves files in corrupted state (bug #45961) or crashes mysqld if innodb_file_per_table is OFF (bug #45808).
[21 Dec 2009 14:27]
Dmitry Lenev
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/95248 3628 Dmitry Lenev 2009-12-21 Tentative patch implementing new type-of-operation-aware metadata locks. Fixes bug #46272 "MySQL 5.4.4, new MDL: unnecessary deadlock" and bug #37346 "innodb does not detect deadlock between update and alter table".
[30 Dec 2009 21:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/95826 3048 Konstantin Osipov 2009-12-31 A quick fix Bug#37346 "innodb does not detect deadlock between update and alter table"
[31 Dec 2009 7:45]
Marko Mäkelä
See also Bug #42643, which is about inadequate locking in TRUNCATE TABLE.
[31 Dec 2009 8:17]
Konstantin Osipov
Marco, Does InnoDB table level lock prevent new row locks from being placed? In other words, is it subject to starvation, in case there is a constant stream of updates? For TRUNCATE, yes we could use a similar approach. In InnoDB truncate is normally done by means of mysql_truncate_by_delete() method, and this does take table level locks. Why do you think it does not? Besides, TRUNCATE alwyas is executed in its own transaction (has an implicit COMMIT before and after). I don't see any reason why this fix would be not applicable to TRUNCATE. For DROP TABLE, yes, we don't take any table level locks. But we can do it optionally, for tables that exist and can be opened.
[29 Jan 2010 10:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/98579 3077 Dmitry Lenev 2010-01-29 Tentative patch implementing new type-of-operation-aware metadata locks. Fixes bug #46272 "MySQL 5.4.4, new MDL: unnecessary deadlock" and bug #37346 "innodb does not detect deadlock between update and alter table". More after review fixes.
[29 Jan 2010 14:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/98630 3079 Dmitry Lenev 2010-01-29 Tentative patch implementing new type-of-operation-aware metadata locks. Fixes bug #46272 "MySQL 5.4.4, new MDL: unnecessary deadlock" and bug #37346 "innodb does not detect deadlock between update and alter table". More after review fixes.
[29 Jan 2010 15:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/98660 3083 Dmitry Lenev 2010-01-29 Tentative patch implementing new type-of-operation-aware metadata locks. Fixes bug #46272 "MySQL 5.4.4, new MDL: unnecessary deadlock" and bug #37346 "innodb does not detect deadlock between update and alter table". More after review fixes.
[1 Feb 2010 14:37]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/98826 3056 Dmitry Lenev 2010-02-01 Fix for sporadical crashes of lock_multi_bug38499.test caused by patch which implemented new type-of-operation-aware metadata locks and added a wait-for graph based deadlock detector to the MDL subsystem (this patch fixed bug #46272 "MySQL 5.4.4, new MDL: unnecessary deadlock" and bug #37346 "innodb does not detect deadlock between update and alter table"). Crashes were caused by a race in MDL_context::try_acquire_lock(). This method added MDL_ticket to the list of granted tickets and released lock protecting list before setting MDL_ticket::m_lock. Thus some other thread was able to see ticket without properly set m_lock member for some short period of time. If this thread called method involving this member during this period crash happened. This fix ensures that MDL_ticket::m_lock is set in all cases when ticket is added to granted/pending lists in MDL_lock. @ sql/mdl.cc We must set MDL_ticket::m_lock member before adding ticket to the list of granted tickets, since such tickets can be accessed by other threads which might call methods using this member. Added assert which ensures that all MDL_tickets which are added to the granted/pending lists have properly set MDL_ticket::m_lock member. @ sql/mdl.h Adjusted comment describing MDL_ticket::m_lock member to reflect current reality. Added accessor method for this member.
[1 Feb 2010 14:39]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/98827 3057 Dmitry Lenev 2010-02-01 Fix for sporadical crashes of lock_multi_bug38499.test caused by patch which implemented new type-of-operation-aware metadata locks and added a wait-for graph based deadlock detector to the MDL subsystem (this patch fixed bug #46272 "MySQL 5.4.4, new MDL: unnecessary deadlock" and bug #37346 "innodb does not detect deadlock between update and alter table"). Crashes were caused by a race in MDL_context::try_acquire_lock(). This method added MDL_ticket to the list of granted tickets and released lock protecting list before setting MDL_ticket::m_lock. Thus some other thread was able to see ticket without properly set m_lock member for some short period of time. If this thread called method involving this member during this period crash happened. This fix ensures that MDL_ticket::m_lock is set in all cases when ticket is added to granted/pending lists in MDL_lock. @ sql/mdl.cc We must set MDL_ticket::m_lock member before adding ticket to the list of granted tickets, since such tickets can be accessed by other threads which might call methods using this member. Added assert which ensures that all MDL_tickets which are added to the granted/pending lists have properly set MDL_ticket::m_lock member. @ sql/mdl.h Adjusted comment describing MDL_ticket::m_lock member to reflect current reality. Added accessor method for this member.
[1 Feb 2010 18:00]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/98840 3058 Dmitry Lenev 2010-02-01 Fix for sporadical hangs of mdl_sync.test caused by patch which implemented new type-of-operation-aware metadata locks and added a wait-for graph based deadlock detector to the MDL subsystem (this patch fixed bug #46272 "MySQL 5.4.4, new MDL: unnecessary deadlock" and bug #37346 "innodb does not detect deadlock between update and alter table"). These hangs were caused by missing include of wait_condition.inc. This fix simply adds them. @ mysql-test/t/mdl_sync.test Added missing include of wait_condition.inc.
[3 Feb 2010 19:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/99143 3070 Dmitry Lenev 2010-02-03 A follow-up for the patch which implemented new type-of-operation-aware metadata locks and added a wait-for graph based deadlock detector to the MDL subsystem (this patch fixed bug #46272 "MySQL 5.4.4, new MDL: unnecessary deadlock" and bug #37346 "innodb does not detect deadlock between update and alter table"). Removed unused and redundant method.
[16 Feb 2010 16:46]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100216101445-2ofzkh48aq2e0e8o) (version source revid:kostja@sun.com-20100210211106-nq8ztcq2z9o4csit) (merge vers: 6.0.14-alpha) (pib:16)
[16 Feb 2010 16:56]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100216101208-33qkfwdr0tep3pf2) (version source revid:kostja@sun.com-20100203220607-r41c9nlz73rllxso) (pib:16)
[2 Mar 2010 1:17]
Paul DuBois
Not present in any released version. No changelog entry needed. Setting report to Need Merge pending push of Celosia into release tree.
[2 Mar 2010 8:34]
Dmitry Lenev
Hello Paul! You are not quite correct. This bug was repeatable in 6.0 and is still repeatable in 5.1 and 5.0.
[3 Mar 2010 15:36]
Paul DuBois
Noted in 6.0.14 changelog. Concurrent execution of ALTER TABLE for InnoDB table and a transaction that tried to read and then update the table could result in a deadlock between table-level locks and InnoDB row locks, which was detected only after the innodb_lock_wait_timeout timeout occurred. Setting report to Need Merge pending push of Celosia to release tree.
[6 Mar 2010 10:56]
Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20100216221947-luyhph0txl2c5tc8) (merge vers: 5.5.99-m3) (pib:16)
[7 Mar 2010 18:34]
Paul DuBois
Noted in 5.5.3 changelog.