Bug #50273 thread pool causes easier deadlock between mysql_rm_db and mysql_rm_table_part2
Submitted: 12 Jan 2010 12:57 Modified: 3 Feb 2013 18:11
Reporter: Philip Stoev Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Locking Severity:S2 (Serious)
Version:next-mr-wl5136 OS:Any
Assigned to: Mikael Ronström CPU Architecture:Any
Tags: deadlock, thread pool

[12 Jan 2010 12:57] Philip Stoev
Description:
When executing a DDL workload with thread_pool_size=2 , mysqld deadlocked with the two threads having the following stack traces:

#3  0x00000000009c3ef4 in safe_mutex_lock (mp=0xfafc20, try_lock=0 '\0', file=0xb41ba5 "sql_db.cc", line=890) at thr_mutex.c:149
#4  0x000000000078c257 in mysql_rm_db (thd=0x7fde60177658, db=0x7fde60167dc8 "testdb_S", if_exists=false, silent=false) at sql_db.cc:890
#5  0x0000000000647297 in mysql_execute_command (thd=0x7fde60177658) at sql_parse.cc:3558
#6  0x000000000064bef1 in mysql_parse (thd=0x7fde60177658, inBuf=0x7fde60165f67 "DROP SCHEMA testdb_S /* Sequence end */", length=39,
    found_semicolon=0x7fde65d22e40) at sql_parse.cc:5987
#7  0x000000000064cce7 in dispatch_command (command=COM_QUERY, thd=0x7fde60177658, packet=0x7fde6015de89 "", packet_length=118) at sql_parse.cc:1174
#8  0x000000000064df55 in do_command (thd=0x7fde60177658) at sql_parse.cc:798
#9  0x000000000063a712 in tp_process_event (my_thread_data=0xfb92e8) at scheduler_thread_pool.cc:303
#10 0x000000000063b182 in tp_worker_thread_main (arg=0xfb92e8) at scheduler_thread_pool.cc:638
#11 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#12 0x000000315a4e627d in clone () from /lib64/libc.so.6

and

#2  0x0000000000689ba0 in wait_for_condition (thd=0x7fde60161ec8, mutex=0xfafca0, cond=0xfb0720) at sql_base.cc:2203
#3  0x0000000000624c44 in wait_for_locked_table_names (thd=0x7fde60161ec8, table_list=0x7fde6008c5a0) at lock.cc:1130
#4  0x00000000006250c3 in lock_table_names (thd=0x7fde60161ec8, table_list=0x7fde6008c5a0) at lock.cc:1171
#5  0x000000000062511e in lock_table_names_exclusively (thd=0x7fde60161ec8, table_list=0x7fde6008c5a0) at lock.cc:1202
#6  0x000000000079e334 in mysql_rm_table_part2 (thd=0x7fde60161ec8, tables=0x7fde6008c5a0, if_exists=true, drop_temporary=false, drop_view=true,
    dont_log_query=true) at sql_table.cc:1900
#7  0x000000000078bfd3 in mysql_rm_known_files (thd=0x7fde60161ec8, dirp=0x1d34728, db=0x7fde6008c590 "testdb_N", org_path=0x7fde65d61ef0 "./testdb_N/",
    level=0, dropped_tables=0x7fde65d62218) at sql_db.cc:1182
#8  0x000000000078c3e7 in mysql_rm_db (thd=0x7fde60161ec8, db=0x7fde6008c590 "testdb_N", if_exists=true, silent=false) at sql_db.cc:938
#9  0x0000000000647297 in mysql_execute_command (thd=0x7fde60161ec8) at sql_parse.cc:3558
#10 0x000000000064bef1 in mysql_parse (thd=0x7fde60161ec8, inBuf=0x7fde6008c4d8 "DROP DATABASE IF EXISTS testdb_N", length=32,
    found_semicolon=0x7fde65d63e40) at sql_parse.cc:5987
#11 0x000000000064cb53 in dispatch_command (command=COM_QUERY, thd=0x7fde60161ec8, packet=0x7fde6016b6b9 "DROP DATABASE IF EXISTS testdb_N",
    packet_length=32) at sql_parse.cc:1128
#12 0x000000000064df55 in do_command (thd=0x7fde60161ec8) at sql_parse.cc:798
#13 0x000000000063a712 in tp_process_event (my_thread_data=0xfb20b0) at scheduler_thread_pool.cc:303
#14 0x000000000063b182 in tp_worker_thread_main (arg=0xfb20b0) at scheduler_thread_pool.cc:638
#15 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#16 0x000000315a4e627d in clone () from /lib64/libc.so.6

The situation (and the workload that caused it) are similar to bug #48940 with the difference being that with thread_pool_size=2 , the deadlock occurs much faster, and causes the entire server to become inaccessible, and no new client connections are accepted.

How to repeat:
With the RQG, run:

$ perl runall.pl \
  --grammar=conf/WL5004_sql.yy \
  --gendata=conf/WL5004_data.zz \
  --mysqld=--thread_pool_size=2 \
  --basedir=/build/bzr/mysql-next-mr-wl5136/ \
  --mysqld=--secure-file-priv=/tmp

Please disregard all output from the test -- a deadlock should develop shortly after takeoff, and all output will cease. At that point, no new connections will be accepted so you can then use gdb to debug.
[12 Jan 2010 13:04] Philip Stoev
Core and binary:

http://mysql-systemqa.s3.amazonaws.com/var-bug50273.zip

Source:

revision-id: k.long@sun.com-20100111170153-w3gb6h6dukf203bo
date: 2010-01-11 10:01:53 -0700
build-date: 2010-01-12 14:58:13 +0200
revno: 2968
branch-nick: mysql-next-mr-wl5136

--thread_pool_size=1 and 4 also cause a very quick deadlock.
[14 Jan 2010 8:59] Jon Olav Hauglid
This bug looks very similar to a couple of bugs I've fixed in the MDL tree.
I suggest this bug is (re-)verified once the MDL and thread pool trees have been merged.
Also because the listed backtraces will be quite different once MDL code is present.
[14 Jan 2010 9:46] Philip Stoev
Jon, you are right. The idea of this bug is to fix the thread-pool related issues (such as the deadlock happening faster and locking out the entire server). If only the underlying deadlock is fixed, the threadpool-related issues will theoretically remain, which is not good.
[3 Feb 2013 18:11] Shane Bester
cannot repeat on mysql-trunk using a concurrent workload involving create/drop database and create table, and thread pool, with many reconnections.