| Bug #48541 | Deadlock between LOCK_open and LOCK_mdl (was deadlock with LOCK_thread_count) | ||
|---|---|---|---|
| Submitted: | 4 Nov 2009 18:43 | Modified: | 7 Mar 2010 1:37 |
| Reporter: | Philip Stoev | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Locking | Severity: | S1 (Critical) |
| Version: | 6.0-codebase-bugfixing | OS: | Any |
| Assigned to: | Jon Olav Hauglid | CPU Architecture: | Any |
| Tags: | mdl | ||
[4 Nov 2009 18:43]
Philip Stoev
Grammar for bug 48541
Attachment: bug48541.yy (application/octet-stream, text), 29.62 KiB.
[4 Nov 2009 18:48]
Philip Stoev
Thread stacks for 48541
Attachment: bug48541.threads.txt (text/plain), 73.00 KiB.
[16 Nov 2009 11:00]
Jon Olav Hauglid
The relevant threads from the above thread stacks are:
Thread 29 (Thread 0x7fc176283950 (LWP 23216)):
#0 0x000000315b00db14 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x000000315b00921c in _L_lock_1078 () from /lib64/libpthread.so.0
#2 0x000000315b0090f0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000000000b54929 in safe_mutex_lock (mp=0x1247a60, my_flags=0, file=0xd2040c "sql_base.cc", line=8212) at thr_mutex.c:225
#4 0x000000000072bf84 in mysql_notify_thread_having_shared_lock (thd=0x3aa8928, in_use=0x384a4c8) at sql_base.cc:8212
#5 0x000000000090da5a in notify_shared_lock (thd=0x3aa8928, conflicting_ticket=0x3d0c510) at mdl.cc:807
#6 0x000000000090e26e in MDL_context::acquire_exclusive_locks (this=0x3aa8a00, mdl_requests=0x7fc176280900) at mdl.cc:919
#7 0x000000000090e6f1 in MDL_context::acquire_exclusive_lock (this=0x3aa8a00, mdl_request=0x38c5e60) at mdl.cc:822
#8 0x0000000000733bd8 in open_table_get_mdl_lock (thd=0x3aa8928, table_list=0x38c5b08, mdl_request=0x38c5e60, ot_ctx=0x7fc176281030, flags=0)
at sql_base.cc:2343
#9 0x0000000000738478 in open_table (thd=0x3aa8928, table_list=0x38c5b08, mem_root=0x7fc176280fc0, ot_ctx=0x7fc176281030, flags=0) at sql_base.cc:2618
#10 0x0000000000739796 in open_and_process_table (thd=0x3aa8928, lex=0x3aaa1f8, tables=0x38c5b08, counter=0x7fc176281100, flags=0,
prelocking_strategy=0x7fc176281140, has_prelocking_list=false, ot_ctx=0x7fc176281030, new_frm_mem=0x7fc176280fc0) at sql_base.cc:3940
#11 0x0000000000739e83 in open_tables (thd=0x3aa8928, start=0x7fc1762810c0, counter=0x7fc176281100, flags=0, prelocking_strategy=0x7fc176281140)
at sql_base.cc:4178
#12 0x000000000073a378 in open_and_lock_tables_derived (thd=0x3aa8928, tables=0x38c5b08, derived=true, flags=0, prelocking_strategy=0x7fc176281140)
at sql_base.cc:4763
#13 0x00000000006e7cc8 in open_and_lock_tables_derived (thd=0x3aa8928, tables=0x38c5b08, derived=true, flags=0) at ../mysql_priv.h:1518
#14 0x00000000006e7d03 in open_and_lock_tables (thd=0x3aa8928, tables=0x38c5b08) at ../../sql/mysql_priv.h:1528
#15 0x00000000008bbbf4 in mysql_create_view (thd=0x3aa8928, views=0x38c5b08, mode=VIEW_CREATE_NEW) at sql_view.cc:402
#16 0x00000000006e3e02 in mysql_execute_command (thd=0x3aa8928) at sql_parse.cc:4692
#17 0x00000000006e4b41 in mysql_parse (thd=0x3aa8928,
inBuf=0x38c57f0 "/* Sequence start */ CREATE ALGORITHM = MERGE VIEW testdb_S . t1_all_2_S AS SELECT HIGH_PRIORITY * FROM testdb_S . t1_all_1_S A NATURAL JOIN testdb_S . t1_all_1_N B WHERE `pk` BETWEEN 3 AND 3 + 1"..., length=295, found_semicolon=0x7fc176282f00) at sql_parse.cc:5979
Thread 24 (Thread 0x7fc17603a950 (LWP 23238)):
#0 0x000000315b00db14 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x000000315b0091b0 in _L_lock_102 () from /lib64/libpthread.so.0
#2 0x000000315b008a9e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000000000b54929 in safe_mutex_lock (mp=0x1258600, my_flags=0, file=0xd2040c "sql_base.cc", line=902) at thr_mutex.c:225
#4 0x000000000072be19 in kill_delayed_threads_for_table (share=0x38f9f30) at sql_base.cc:902
#5 0x000000000073ab2e in close_cached_tables (thd=0x3a7bd68, tables=0x7fc1700637a0, have_lock=false, wait_for_refresh=true) at sql_base.cc:966
#6 0x00000000006d6805 in reload_acl_and_cache (thd=0x3a7bd68, options=4, tables=0x7fc1700637a0, write_to_binlog=0x7fc17603925f) at sql_parse.cc:6966
#7 0x00000000006e1e67 in mysql_execute_command (thd=0x3a7bd68) at sql_parse.cc:4084
#8 0x00000000006e4b41 in mysql_parse (thd=0x3a7bd68, inBuf=0x7fc170063690 "FLUSH TABLE testdb_S . t1_all_1_S", length=33, found_semicolon=0x7fc176039f00)
at sql_parse.cc:5979
Thread 29 holds LOCK_mdl, tries to get LOCK_open
Thread 24 holds LOCK_open, tries to get LOCK_mdl
Notice that while Thread 24 executes:
pthread_mutex_lock(in_use->mysys_var->current_mutex);
this mutex is located at 0x1258600 which we can see from e.g Thread 20 is LOCK_mdl.
[24 Nov 2009 12:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/91410 3728 Jon Olav Hauglid 2009-11-24 Bug #48541 Server-wide deadlock involving LOCK_thread_count The bug title is misleading. This is a deadlock between LOCK_open and LOCK_mdl. The reason for the deadlock was an improper exit from MDL_context::wait_for_locks() which caused mysys_var->current_mutex to remain LOCK_mdl even though LOCK_mdl was no longer held by that connection. This could lead to a deadlock in the following way: 1) INSERT DELAYED tries to open a table but fails, and trying to recover it calls wait_for_locks(). 2) Due to a pending exclusive request, wait_for_locks() fails and exits without resetting mysys_var->current_mutex for the delayed insert handler thread. So it continues to point to LOCK_mdl. 3) The handler thread manages to open a table. 4) A different connection takes LOCK_open and tries to take LOCK_mdl. 5) FLUSH TABLES from a third connection notices that the handler thread has a table open, and tries to kill it. This involves locking mysys_var->current_mutex while having LOCK_open locked. Since current_mutex mistakenly points to LOCK_mdl, we have a deadlock. This patch makes sure MDL_EXIT_COND() is called before exiting wait_for_locks(). This clears mysys->current_mutex which resolves the issue. The patch also removes the unnecessary locking of mysys_var->current_mutex in kill_delayed_threads() and kill_delayed_threads_for_table(). An assert is added to recover_from_failed_open_table_attempt() after wait_for_locks() is called, to check that current_mutex is indeed reset. With this assert in place, existing tests in (e.g.) mdl_sync.test will fail without this patch.
[16 Dec 2009 11:32]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/94462 3040 Jon Olav Hauglid 2009-12-16 Bug #48541 Deadlock between LOCK_open and LOCK_mdl The reason for the deadlock was an improper exit from MDL_context::wait_for_locks() which caused mysys_var->current_mutex to remain LOCK_mdl even though LOCK_mdl was no longer held by that connection. This could for example lead to a deadlock in the following way: 1) INSERT DELAYED tries to open a table but fails, and trying to recover it calls wait_for_locks(). 2) Due to a pending exclusive request, wait_for_locks() fails and exits without resetting mysys_var->current_mutex for the delayed insert handler thread. So it continues to point to LOCK_mdl. 3) The handler thread manages to open a table. 4) A different connection takes LOCK_open and tries to take LOCK_mdl. 5) FLUSH TABLES from a third connection notices that the handler thread has a table open, and tries to kill it. This involves locking mysys_var->current_mutex while having LOCK_open locked. Since current_mutex mistakenly points to LOCK_mdl, we have a deadlock. This patch makes sure MDL_EXIT_COND() is called before exiting wait_for_locks(). This clears mysys->current_mutex which resolves the issue. An assert is added to recover_from_failed_open_table_attempt() after wait_for_locks() is called, to check that current_mutex is indeed reset. With this assert in place, existing tests in (e.g.) mdl_sync.test will fail without this patch.
[16 Dec 2009 12:51]
Jon Olav Hauglid
Pushed to mysql-next-4284 and merged to mysql-6.0-codebase-4284.
[16 Feb 2010 16:45]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100216101445-2ofzkh48aq2e0e8o) (version source revid:jon.hauglid@sun.com-20091216124907-muo2nw6vd0v6a91o) (merge vers: 6.0.14-alpha) (pib:16)
[16 Feb 2010 16:55]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100216101208-33qkfwdr0tep3pf2) (version source revid:jon.hauglid@sun.com-20091216113211-lysdhzll282ks668) (pib:16)
[2 Mar 2010 0:56]
Paul DuBois
Not present in any released version. No changelog entry. Setting report to Need Merge pending push of Celosia into release tree.
[6 Mar 2010 11:05]
Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20100216221947-luyhph0txl2c5tc8) (merge vers: 5.5.99-m3) (pib:16)
[7 Mar 2010 1:37]
Paul DuBois
No changelog entry needed.

Description: When executing concurrent test with a DDL/MDL grammar (derived from Matthias' MDL grammar), mysqld deadlocked with the following interesting threads: SHOW PROCESSLIST: #3 0x0000000000b54929 in safe_mutex_lock (mp=0x3ccef98, my_flags=0, file=0xd4c59f "sql_show.cc", line=1897) at thr_mutex.c:225 #4 0x000000000087cbfd in mysqld_list_processes (thd=0x7fc170080a58, user=0x0, verbose=true) at sql_show.cc:1897 #5 0x00000000006dfcb0 in mysql_execute_command (thd=0x7fc170080a58) at sql_parse.cc:3478 #6 0x00000000006e4b41 in mysql_parse (thd=0x7fc170080a58, inBuf=0x3f02e80 "SHOW FULL PROCESSLIST", length=21, found_semicolon=0x7fc175c6af00) at sql_parse.cc:5979 #7 0x00000000006e57aa in dispatch_command (command=COM_QUERY, thd=0x7fc170080a58, packet=0x7fc170142ba9 "", packet_length=21) at sql_parse.cc:1076 #8 0x00000000006e6d1c in do_command (thd=0x7fc170080a58) at sql_parse.cc:758 #9 0x00000000006d3a88 in handle_one_connection (arg=0x7fc170080a58) at sql_connect.cc:1164 #10 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0 #11 0x000000315a4e627d in clone () from /lib64/libc.so.6 New connection: #2 0x000000315b0090f0 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000b54929 in safe_mutex_lock (mp=0x1247b00, my_flags=0, file=0xcb61c5 "mysql_priv.h", line=2402) at thr_mutex.c:225 #4 0x00000000006b67f8 in sql_rnd_with_mutex () at mysql_priv.h:2402 #5 0x00000000006b26e0 in THD (this=0x7fc170146be8) at sql_class.cc:585 #6 0x00000000006ca5c4 in handle_connections_sockets () at mysqld.cc:5494 #7 0x00000000006ce02a in main (argc=44, argv=0x7fff8c7ccb58) at mysqld.cc:4855 FLUSH TABLE: #4 0x000000000072be19 in kill_delayed_threads_for_table (share=0x38f9f30) at sql_base.cc:902 #5 0x000000000073ab2e in close_cached_tables (thd=0x3a7bd68, tables=0x7fc1700637a0, have_lock=false, wait_for_refresh=true) at sql_base.cc:966 #6 0x00000000006d6805 in reload_acl_and_cache (thd=0x3a7bd68, options=4, tables=0x7fc1700637a0, write_to_binlog=0x7fc17603925f) at sql_parse.cc:6966 #7 0x00000000006e1e67 in mysql_execute_command (thd=0x3a7bd68) at sql_parse.cc:4084 #8 0x00000000006e4b41 in mysql_parse (thd=0x3a7bd68, inBuf=0x7fc170063690 "FLUSH TABLE testdb_S . t1_all_1_S", length=33, found_semicolon=0x7fc176039f00) at sql_parse.cc:5979 #9 0x00000000006e57aa in dispatch_command (command=COM_QUERY, thd=0x3a7bd68, packet=0x3baf869 "FLUSH TABLE testdb_S . t1_all_1_S ", packet_length=34) at sql_parse.cc:1076 #10 0x00000000006e6d1c in do_command (thd=0x3a7bd68) at sql_parse.cc:758 #11 0x00000000006d3a88 in handle_one_connection (arg=0x3a7bd68) at sql_connect.cc:1164 #12 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0 #13 0x000000315a4e627d in clone () from /lib64/libc.so.6 And other threads hanging in various MDL and table opening calls. How to repeat: In a loop, run the following command line: $ perl runall.pl \ --basedir=/build/bzr/mysql-6.0-codebase-bugfixing/ \ --threads=30 \ --queries=3000 \ --duration=60 \ --mysqld=--table-lock-wait-timeout=1 \ --mysqld=--innodb-lock-wait-timeout=1 \ --mysqld=--log-output=file \ --reporter=Deadlock,Backtrace,Shutdown \ --gendata=conf/WL5004_data.zz \ --grammar=path/to/grammar.yy \ --mem \ --seed=117 After a few dozen runs, you will hit this deadlock. You may also hit other deadlocks and assertions as well