Bug #48541 | Deadlock between LOCK_open and LOCK_mdl (was deadlock with LOCK_thread_count) | ||
---|---|---|---|
Submitted: | 4 Nov 2009 18:43 | Modified: | 7 Mar 2010 1:37 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S1 (Critical) |
Version: | 6.0-codebase-bugfixing | OS: | Any |
Assigned to: | Jon Olav Hauglid | CPU Architecture: | Any |
Tags: | mdl |
[4 Nov 2009 18:43]
Philip Stoev
[4 Nov 2009 18:43]
Philip Stoev
Grammar for bug 48541
Attachment: bug48541.yy (application/octet-stream, text), 29.62 KiB.
[4 Nov 2009 18:48]
Philip Stoev
Thread stacks for 48541
Attachment: bug48541.threads.txt (text/plain), 73.00 KiB.
[16 Nov 2009 11:00]
Jon Olav Hauglid
The relevant threads from the above thread stacks are: Thread 29 (Thread 0x7fc176283950 (LWP 23216)): #0 0x000000315b00db14 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x000000315b00921c in _L_lock_1078 () from /lib64/libpthread.so.0 #2 0x000000315b0090f0 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000b54929 in safe_mutex_lock (mp=0x1247a60, my_flags=0, file=0xd2040c "sql_base.cc", line=8212) at thr_mutex.c:225 #4 0x000000000072bf84 in mysql_notify_thread_having_shared_lock (thd=0x3aa8928, in_use=0x384a4c8) at sql_base.cc:8212 #5 0x000000000090da5a in notify_shared_lock (thd=0x3aa8928, conflicting_ticket=0x3d0c510) at mdl.cc:807 #6 0x000000000090e26e in MDL_context::acquire_exclusive_locks (this=0x3aa8a00, mdl_requests=0x7fc176280900) at mdl.cc:919 #7 0x000000000090e6f1 in MDL_context::acquire_exclusive_lock (this=0x3aa8a00, mdl_request=0x38c5e60) at mdl.cc:822 #8 0x0000000000733bd8 in open_table_get_mdl_lock (thd=0x3aa8928, table_list=0x38c5b08, mdl_request=0x38c5e60, ot_ctx=0x7fc176281030, flags=0) at sql_base.cc:2343 #9 0x0000000000738478 in open_table (thd=0x3aa8928, table_list=0x38c5b08, mem_root=0x7fc176280fc0, ot_ctx=0x7fc176281030, flags=0) at sql_base.cc:2618 #10 0x0000000000739796 in open_and_process_table (thd=0x3aa8928, lex=0x3aaa1f8, tables=0x38c5b08, counter=0x7fc176281100, flags=0, prelocking_strategy=0x7fc176281140, has_prelocking_list=false, ot_ctx=0x7fc176281030, new_frm_mem=0x7fc176280fc0) at sql_base.cc:3940 #11 0x0000000000739e83 in open_tables (thd=0x3aa8928, start=0x7fc1762810c0, counter=0x7fc176281100, flags=0, prelocking_strategy=0x7fc176281140) at sql_base.cc:4178 #12 0x000000000073a378 in open_and_lock_tables_derived (thd=0x3aa8928, tables=0x38c5b08, derived=true, flags=0, prelocking_strategy=0x7fc176281140) at sql_base.cc:4763 #13 0x00000000006e7cc8 in open_and_lock_tables_derived (thd=0x3aa8928, tables=0x38c5b08, derived=true, flags=0) at ../mysql_priv.h:1518 #14 0x00000000006e7d03 in open_and_lock_tables (thd=0x3aa8928, tables=0x38c5b08) at ../../sql/mysql_priv.h:1528 #15 0x00000000008bbbf4 in mysql_create_view (thd=0x3aa8928, views=0x38c5b08, mode=VIEW_CREATE_NEW) at sql_view.cc:402 #16 0x00000000006e3e02 in mysql_execute_command (thd=0x3aa8928) at sql_parse.cc:4692 #17 0x00000000006e4b41 in mysql_parse (thd=0x3aa8928, inBuf=0x38c57f0 "/* Sequence start */ CREATE ALGORITHM = MERGE VIEW testdb_S . t1_all_2_S AS SELECT HIGH_PRIORITY * FROM testdb_S . t1_all_1_S A NATURAL JOIN testdb_S . t1_all_1_N B WHERE `pk` BETWEEN 3 AND 3 + 1"..., length=295, found_semicolon=0x7fc176282f00) at sql_parse.cc:5979 Thread 24 (Thread 0x7fc17603a950 (LWP 23238)): #0 0x000000315b00db14 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x000000315b0091b0 in _L_lock_102 () from /lib64/libpthread.so.0 #2 0x000000315b008a9e in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000b54929 in safe_mutex_lock (mp=0x1258600, my_flags=0, file=0xd2040c "sql_base.cc", line=902) at thr_mutex.c:225 #4 0x000000000072be19 in kill_delayed_threads_for_table (share=0x38f9f30) at sql_base.cc:902 #5 0x000000000073ab2e in close_cached_tables (thd=0x3a7bd68, tables=0x7fc1700637a0, have_lock=false, wait_for_refresh=true) at sql_base.cc:966 #6 0x00000000006d6805 in reload_acl_and_cache (thd=0x3a7bd68, options=4, tables=0x7fc1700637a0, write_to_binlog=0x7fc17603925f) at sql_parse.cc:6966 #7 0x00000000006e1e67 in mysql_execute_command (thd=0x3a7bd68) at sql_parse.cc:4084 #8 0x00000000006e4b41 in mysql_parse (thd=0x3a7bd68, inBuf=0x7fc170063690 "FLUSH TABLE testdb_S . t1_all_1_S", length=33, found_semicolon=0x7fc176039f00) at sql_parse.cc:5979 Thread 29 holds LOCK_mdl, tries to get LOCK_open Thread 24 holds LOCK_open, tries to get LOCK_mdl Notice that while Thread 24 executes: pthread_mutex_lock(in_use->mysys_var->current_mutex); this mutex is located at 0x1258600 which we can see from e.g Thread 20 is LOCK_mdl.
[24 Nov 2009 12:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/91410 3728 Jon Olav Hauglid 2009-11-24 Bug #48541 Server-wide deadlock involving LOCK_thread_count The bug title is misleading. This is a deadlock between LOCK_open and LOCK_mdl. The reason for the deadlock was an improper exit from MDL_context::wait_for_locks() which caused mysys_var->current_mutex to remain LOCK_mdl even though LOCK_mdl was no longer held by that connection. This could lead to a deadlock in the following way: 1) INSERT DELAYED tries to open a table but fails, and trying to recover it calls wait_for_locks(). 2) Due to a pending exclusive request, wait_for_locks() fails and exits without resetting mysys_var->current_mutex for the delayed insert handler thread. So it continues to point to LOCK_mdl. 3) The handler thread manages to open a table. 4) A different connection takes LOCK_open and tries to take LOCK_mdl. 5) FLUSH TABLES from a third connection notices that the handler thread has a table open, and tries to kill it. This involves locking mysys_var->current_mutex while having LOCK_open locked. Since current_mutex mistakenly points to LOCK_mdl, we have a deadlock. This patch makes sure MDL_EXIT_COND() is called before exiting wait_for_locks(). This clears mysys->current_mutex which resolves the issue. The patch also removes the unnecessary locking of mysys_var->current_mutex in kill_delayed_threads() and kill_delayed_threads_for_table(). An assert is added to recover_from_failed_open_table_attempt() after wait_for_locks() is called, to check that current_mutex is indeed reset. With this assert in place, existing tests in (e.g.) mdl_sync.test will fail without this patch.
[16 Dec 2009 11:32]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/94462 3040 Jon Olav Hauglid 2009-12-16 Bug #48541 Deadlock between LOCK_open and LOCK_mdl The reason for the deadlock was an improper exit from MDL_context::wait_for_locks() which caused mysys_var->current_mutex to remain LOCK_mdl even though LOCK_mdl was no longer held by that connection. This could for example lead to a deadlock in the following way: 1) INSERT DELAYED tries to open a table but fails, and trying to recover it calls wait_for_locks(). 2) Due to a pending exclusive request, wait_for_locks() fails and exits without resetting mysys_var->current_mutex for the delayed insert handler thread. So it continues to point to LOCK_mdl. 3) The handler thread manages to open a table. 4) A different connection takes LOCK_open and tries to take LOCK_mdl. 5) FLUSH TABLES from a third connection notices that the handler thread has a table open, and tries to kill it. This involves locking mysys_var->current_mutex while having LOCK_open locked. Since current_mutex mistakenly points to LOCK_mdl, we have a deadlock. This patch makes sure MDL_EXIT_COND() is called before exiting wait_for_locks(). This clears mysys->current_mutex which resolves the issue. An assert is added to recover_from_failed_open_table_attempt() after wait_for_locks() is called, to check that current_mutex is indeed reset. With this assert in place, existing tests in (e.g.) mdl_sync.test will fail without this patch.
[16 Dec 2009 12:51]
Jon Olav Hauglid
Pushed to mysql-next-4284 and merged to mysql-6.0-codebase-4284.
[16 Feb 2010 16:45]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100216101445-2ofzkh48aq2e0e8o) (version source revid:jon.hauglid@sun.com-20091216124907-muo2nw6vd0v6a91o) (merge vers: 6.0.14-alpha) (pib:16)
[16 Feb 2010 16:55]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100216101208-33qkfwdr0tep3pf2) (version source revid:jon.hauglid@sun.com-20091216113211-lysdhzll282ks668) (pib:16)
[2 Mar 2010 0:56]
Paul DuBois
Not present in any released version. No changelog entry. Setting report to Need Merge pending push of Celosia into release tree.
[6 Mar 2010 11:05]
Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20100216221947-luyhph0txl2c5tc8) (merge vers: 5.5.99-m3) (pib:16)
[7 Mar 2010 1:37]
Paul DuBois
No changelog entry needed.