Bug #48541 Deadlock between LOCK_open and LOCK_mdl (was deadlock with LOCK_thread_count)
Submitted: 4 Nov 2009 18:43 Modified: 7 Mar 2010 1:37
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S1 (Critical)
Version:6.0-codebase-bugfixing OS:Any
Assigned to: Jon Olav Hauglid CPU Architecture:Any
Tags: mdl

[4 Nov 2009 18:43] Philip Stoev
Description:
When executing concurrent test with a DDL/MDL grammar (derived from Matthias' MDL grammar), mysqld deadlocked with the following interesting threads:

SHOW PROCESSLIST:

#3  0x0000000000b54929 in safe_mutex_lock (mp=0x3ccef98, my_flags=0, file=0xd4c59f "sql_show.cc", line=1897) at thr_mutex.c:225
#4  0x000000000087cbfd in mysqld_list_processes (thd=0x7fc170080a58, user=0x0, verbose=true) at sql_show.cc:1897
#5  0x00000000006dfcb0 in mysql_execute_command (thd=0x7fc170080a58) at sql_parse.cc:3478
#6  0x00000000006e4b41 in mysql_parse (thd=0x7fc170080a58, inBuf=0x3f02e80 "SHOW FULL PROCESSLIST", length=21, found_semicolon=0x7fc175c6af00)
    at sql_parse.cc:5979
#7  0x00000000006e57aa in dispatch_command (command=COM_QUERY, thd=0x7fc170080a58, packet=0x7fc170142ba9 "", packet_length=21) at sql_parse.cc:1076
#8  0x00000000006e6d1c in do_command (thd=0x7fc170080a58) at sql_parse.cc:758
#9  0x00000000006d3a88 in handle_one_connection (arg=0x7fc170080a58) at sql_connect.cc:1164
#10 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#11 0x000000315a4e627d in clone () from /lib64/libc.so.6

New connection:

#2  0x000000315b0090f0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000b54929 in safe_mutex_lock (mp=0x1247b00, my_flags=0, file=0xcb61c5 "mysql_priv.h", line=2402) at thr_mutex.c:225
#4  0x00000000006b67f8 in sql_rnd_with_mutex () at mysql_priv.h:2402
#5  0x00000000006b26e0 in THD (this=0x7fc170146be8) at sql_class.cc:585
#6  0x00000000006ca5c4 in handle_connections_sockets () at mysqld.cc:5494
#7  0x00000000006ce02a in main (argc=44, argv=0x7fff8c7ccb58) at mysqld.cc:4855

FLUSH TABLE:

#4  0x000000000072be19 in kill_delayed_threads_for_table (share=0x38f9f30) at sql_base.cc:902
#5  0x000000000073ab2e in close_cached_tables (thd=0x3a7bd68, tables=0x7fc1700637a0, have_lock=false, wait_for_refresh=true) at sql_base.cc:966
#6  0x00000000006d6805 in reload_acl_and_cache (thd=0x3a7bd68, options=4, tables=0x7fc1700637a0, write_to_binlog=0x7fc17603925f) at sql_parse.cc:6966
#7  0x00000000006e1e67 in mysql_execute_command (thd=0x3a7bd68) at sql_parse.cc:4084
#8  0x00000000006e4b41 in mysql_parse (thd=0x3a7bd68, inBuf=0x7fc170063690 "FLUSH TABLE testdb_S . t1_all_1_S", length=33, found_semicolon=0x7fc176039f00)
    at sql_parse.cc:5979
#9  0x00000000006e57aa in dispatch_command (command=COM_QUERY, thd=0x3a7bd68, packet=0x3baf869 "FLUSH TABLE testdb_S . t1_all_1_S ", packet_length=34)
    at sql_parse.cc:1076
#10 0x00000000006e6d1c in do_command (thd=0x3a7bd68) at sql_parse.cc:758
#11 0x00000000006d3a88 in handle_one_connection (arg=0x3a7bd68) at sql_connect.cc:1164
#12 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#13 0x000000315a4e627d in clone () from /lib64/libc.so.6

And other threads hanging in various MDL and table opening calls.

How to repeat:
In a loop, run the following command line:

$ perl runall.pl \
  --basedir=/build/bzr/mysql-6.0-codebase-bugfixing/ \
  --threads=30 \
  --queries=3000 \
  --duration=60 \
  --mysqld=--table-lock-wait-timeout=1 \
  --mysqld=--innodb-lock-wait-timeout=1 \
  --mysqld=--log-output=file \
  --reporter=Deadlock,Backtrace,Shutdown \
  --gendata=conf/WL5004_data.zz \
  --grammar=path/to/grammar.yy \
  --mem \
  --seed=117

After a few dozen runs, you will hit this deadlock. You may also hit other deadlocks and assertions as well
[4 Nov 2009 18:43] Philip Stoev
Grammar for bug 48541

Attachment: bug48541.yy (application/octet-stream, text), 29.62 KiB.

[4 Nov 2009 18:48] Philip Stoev
Thread stacks for 48541

Attachment: bug48541.threads.txt (text/plain), 73.00 KiB.

[16 Nov 2009 11:00] Jon Olav Hauglid
The relevant threads from the above thread stacks are:

Thread 29 (Thread 0x7fc176283950 (LWP 23216)):
#0  0x000000315b00db14 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000315b00921c in _L_lock_1078 () from /lib64/libpthread.so.0
#2  0x000000315b0090f0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000b54929 in safe_mutex_lock (mp=0x1247a60, my_flags=0, file=0xd2040c "sql_base.cc", line=8212) at thr_mutex.c:225
#4  0x000000000072bf84 in mysql_notify_thread_having_shared_lock (thd=0x3aa8928, in_use=0x384a4c8) at sql_base.cc:8212
#5  0x000000000090da5a in notify_shared_lock (thd=0x3aa8928, conflicting_ticket=0x3d0c510) at mdl.cc:807
#6  0x000000000090e26e in MDL_context::acquire_exclusive_locks (this=0x3aa8a00, mdl_requests=0x7fc176280900) at mdl.cc:919
#7  0x000000000090e6f1 in MDL_context::acquire_exclusive_lock (this=0x3aa8a00, mdl_request=0x38c5e60) at mdl.cc:822
#8  0x0000000000733bd8 in open_table_get_mdl_lock (thd=0x3aa8928, table_list=0x38c5b08, mdl_request=0x38c5e60, ot_ctx=0x7fc176281030, flags=0)
    at sql_base.cc:2343
#9  0x0000000000738478 in open_table (thd=0x3aa8928, table_list=0x38c5b08, mem_root=0x7fc176280fc0, ot_ctx=0x7fc176281030, flags=0) at sql_base.cc:2618
#10 0x0000000000739796 in open_and_process_table (thd=0x3aa8928, lex=0x3aaa1f8, tables=0x38c5b08, counter=0x7fc176281100, flags=0,
    prelocking_strategy=0x7fc176281140, has_prelocking_list=false, ot_ctx=0x7fc176281030, new_frm_mem=0x7fc176280fc0) at sql_base.cc:3940
#11 0x0000000000739e83 in open_tables (thd=0x3aa8928, start=0x7fc1762810c0, counter=0x7fc176281100, flags=0, prelocking_strategy=0x7fc176281140)
    at sql_base.cc:4178
#12 0x000000000073a378 in open_and_lock_tables_derived (thd=0x3aa8928, tables=0x38c5b08, derived=true, flags=0, prelocking_strategy=0x7fc176281140)
    at sql_base.cc:4763
#13 0x00000000006e7cc8 in open_and_lock_tables_derived (thd=0x3aa8928, tables=0x38c5b08, derived=true, flags=0) at ../mysql_priv.h:1518
#14 0x00000000006e7d03 in open_and_lock_tables (thd=0x3aa8928, tables=0x38c5b08) at ../../sql/mysql_priv.h:1528
#15 0x00000000008bbbf4 in mysql_create_view (thd=0x3aa8928, views=0x38c5b08, mode=VIEW_CREATE_NEW) at sql_view.cc:402
#16 0x00000000006e3e02 in mysql_execute_command (thd=0x3aa8928) at sql_parse.cc:4692
#17 0x00000000006e4b41 in mysql_parse (thd=0x3aa8928,
    inBuf=0x38c57f0 "/* Sequence start */ CREATE ALGORITHM = MERGE VIEW testdb_S . t1_all_2_S  AS SELECT HIGH_PRIORITY  * FROM testdb_S . t1_all_1_S   A NATURAL JOIN testdb_S . t1_all_1_N  B WHERE `pk` BETWEEN 3 AND 3 + 1"..., length=295, found_semicolon=0x7fc176282f00) at sql_parse.cc:5979

Thread 24 (Thread 0x7fc17603a950 (LWP 23238)):
#0  0x000000315b00db14 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000315b0091b0 in _L_lock_102 () from /lib64/libpthread.so.0
#2  0x000000315b008a9e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000b54929 in safe_mutex_lock (mp=0x1258600, my_flags=0, file=0xd2040c "sql_base.cc", line=902) at thr_mutex.c:225
#4  0x000000000072be19 in kill_delayed_threads_for_table (share=0x38f9f30) at sql_base.cc:902
#5  0x000000000073ab2e in close_cached_tables (thd=0x3a7bd68, tables=0x7fc1700637a0, have_lock=false, wait_for_refresh=true) at sql_base.cc:966
#6  0x00000000006d6805 in reload_acl_and_cache (thd=0x3a7bd68, options=4, tables=0x7fc1700637a0, write_to_binlog=0x7fc17603925f) at sql_parse.cc:6966
#7  0x00000000006e1e67 in mysql_execute_command (thd=0x3a7bd68) at sql_parse.cc:4084
#8  0x00000000006e4b41 in mysql_parse (thd=0x3a7bd68, inBuf=0x7fc170063690 "FLUSH TABLE testdb_S . t1_all_1_S", length=33, found_semicolon=0x7fc176039f00)
    at sql_parse.cc:5979

Thread 29 holds LOCK_mdl, tries to get LOCK_open
Thread 24 holds LOCK_open, tries to get LOCK_mdl

Notice that while Thread 24 executes:
        pthread_mutex_lock(in_use->mysys_var->current_mutex);
this mutex is located at 0x1258600 which we can see from e.g Thread 20 is LOCK_mdl.
[24 Nov 2009 12:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/91410

3728 Jon Olav Hauglid	2009-11-24
      Bug #48541 Server-wide deadlock involving LOCK_thread_count
      
      The bug title is misleading. This is a deadlock between LOCK_open and LOCK_mdl.
      The reason for the deadlock was an improper exit from
      MDL_context::wait_for_locks() which caused mysys_var->current_mutex to remain
      LOCK_mdl even though LOCK_mdl was no longer held by that connection. 
      
      This could lead to a deadlock in the following way:
      1) INSERT DELAYED tries to open a table but fails, and trying to recover it
      calls wait_for_locks().
      2) Due to a pending exclusive request, wait_for_locks() fails and exits without
      resetting mysys_var->current_mutex for the delayed insert handler thread. So it
      continues to point to LOCK_mdl.
      3) The handler thread manages to open a table.
      4) A different connection takes LOCK_open and tries to take LOCK_mdl.
      5) FLUSH TABLES from a third connection notices that the handler thread has a
      table open, and tries to kill it. This involves locking mysys_var->current_mutex
      while having LOCK_open locked. Since current_mutex mistakenly points to LOCK_mdl,
      we have a deadlock.
      
      This patch makes sure MDL_EXIT_COND() is called before exiting wait_for_locks().
      This clears mysys->current_mutex which resolves the issue. The patch also removes
      the unnecessary locking of mysys_var->current_mutex in kill_delayed_threads() and
      kill_delayed_threads_for_table().
      
      An assert is added to recover_from_failed_open_table_attempt() after
      wait_for_locks() is called, to check that current_mutex is indeed reset.
      With this assert in place, existing tests in (e.g.) mdl_sync.test will fail
      without this patch.
[16 Dec 2009 11:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/94462

3040 Jon Olav Hauglid	2009-12-16
      Bug #48541 Deadlock between LOCK_open and LOCK_mdl
      
      The reason for the deadlock was an improper exit from
      MDL_context::wait_for_locks() which caused mysys_var->current_mutex to remain
      LOCK_mdl even though LOCK_mdl was no longer held by that connection. 
      
      This could for example lead to a deadlock in the following way:
      1) INSERT DELAYED tries to open a table but fails, and trying to recover it
      calls wait_for_locks().
      2) Due to a pending exclusive request, wait_for_locks() fails and exits without
      resetting mysys_var->current_mutex for the delayed insert handler thread. So it
      continues to point to LOCK_mdl.
      3) The handler thread manages to open a table.
      4) A different connection takes LOCK_open and tries to take LOCK_mdl.
      5) FLUSH TABLES from a third connection notices that the handler thread has a
      table open, and tries to kill it. This involves locking mysys_var->current_mutex
      while having LOCK_open locked. Since current_mutex mistakenly points to LOCK_mdl,
      we have a deadlock.
      
      This patch makes sure MDL_EXIT_COND() is called before exiting wait_for_locks().
      This clears mysys->current_mutex which resolves the issue. 
      
      An assert is added to recover_from_failed_open_table_attempt() after
      wait_for_locks() is called, to check that current_mutex is indeed reset.
      With this assert in place, existing tests in (e.g.) mdl_sync.test will fail
      without this patch.
[16 Dec 2009 12:51] Jon Olav Hauglid
Pushed to mysql-next-4284 and merged to mysql-6.0-codebase-4284.
[16 Feb 2010 16:45] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100216101445-2ofzkh48aq2e0e8o) (version source revid:jon.hauglid@sun.com-20091216124907-muo2nw6vd0v6a91o) (merge vers: 6.0.14-alpha) (pib:16)
[16 Feb 2010 16:55] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100216101208-33qkfwdr0tep3pf2) (version source revid:jon.hauglid@sun.com-20091216113211-lysdhzll282ks668) (pib:16)
[2 Mar 2010 0:56] Paul DuBois
Not present in any released version. No changelog entry.

Setting report to Need Merge pending push of Celosia into release tree.
[6 Mar 2010 11:05] Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20100216221947-luyhph0txl2c5tc8) (merge vers: 5.5.99-m3) (pib:16)
[7 Mar 2010 1:37] Paul DuBois
No changelog entry needed.