MySQL Bugs: #50273: thread pool causes easier deadlock between mysql_rm_db and mysql_rm_table

Bug #50273	thread pool causes easier deadlock between mysql_rm_db and mysql_rm_table_part2
Submitted:	12 Jan 2010 12:57	Modified:	3 Feb 2013 18:11
Reporter:	Philip Stoev	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Locking	Severity:	S2 (Serious)
Version:	next-mr-wl5136	OS:	Any
Assigned to:	Mikael Ronström	CPU Architecture:	Any
Tags:	deadlock, thread pool

Description:
When executing a DDL workload with thread_pool_size=2 , mysqld deadlocked with the two threads having the following stack traces:

#3  0x00000000009c3ef4 in safe_mutex_lock (mp=0xfafc20, try_lock=0 '\0', file=0xb41ba5 "sql_db.cc", line=890) at thr_mutex.c:149
#4  0x000000000078c257 in mysql_rm_db (thd=0x7fde60177658, db=0x7fde60167dc8 "testdb_S", if_exists=false, silent=false) at sql_db.cc:890
#5  0x0000000000647297 in mysql_execute_command (thd=0x7fde60177658) at sql_parse.cc:3558
#6  0x000000000064bef1 in mysql_parse (thd=0x7fde60177658, inBuf=0x7fde60165f67 "DROP SCHEMA testdb_S /* Sequence end */", length=39,
    found_semicolon=0x7fde65d22e40) at sql_parse.cc:5987
#7  0x000000000064cce7 in dispatch_command (command=COM_QUERY, thd=0x7fde60177658, packet=0x7fde6015de89 "", packet_length=118) at sql_parse.cc:1174
#8  0x000000000064df55 in do_command (thd=0x7fde60177658) at sql_parse.cc:798
#9  0x000000000063a712 in tp_process_event (my_thread_data=0xfb92e8) at scheduler_thread_pool.cc:303
#10 0x000000000063b182 in tp_worker_thread_main (arg=0xfb92e8) at scheduler_thread_pool.cc:638
#11 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#12 0x000000315a4e627d in clone () from /lib64/libc.so.6

and

#2  0x0000000000689ba0 in wait_for_condition (thd=0x7fde60161ec8, mutex=0xfafca0, cond=0xfb0720) at sql_base.cc:2203
#3  0x0000000000624c44 in wait_for_locked_table_names (thd=0x7fde60161ec8, table_list=0x7fde6008c5a0) at lock.cc:1130
#4  0x00000000006250c3 in lock_table_names (thd=0x7fde60161ec8, table_list=0x7fde6008c5a0) at lock.cc:1171
#5  0x000000000062511e in lock_table_names_exclusively (thd=0x7fde60161ec8, table_list=0x7fde6008c5a0) at lock.cc:1202
#6  0x000000000079e334 in mysql_rm_table_part2 (thd=0x7fde60161ec8, tables=0x7fde6008c5a0, if_exists=true, drop_temporary=false, drop_view=true,
    dont_log_query=true) at sql_table.cc:1900
#7  0x000000000078bfd3 in mysql_rm_known_files (thd=0x7fde60161ec8, dirp=0x1d34728, db=0x7fde6008c590 "testdb_N", org_path=0x7fde65d61ef0 "./testdb_N/",
    level=0, dropped_tables=0x7fde65d62218) at sql_db.cc:1182
#8  0x000000000078c3e7 in mysql_rm_db (thd=0x7fde60161ec8, db=0x7fde6008c590 "testdb_N", if_exists=true, silent=false) at sql_db.cc:938
#9  0x0000000000647297 in mysql_execute_command (thd=0x7fde60161ec8) at sql_parse.cc:3558
#10 0x000000000064bef1 in mysql_parse (thd=0x7fde60161ec8, inBuf=0x7fde6008c4d8 "DROP DATABASE IF EXISTS testdb_N", length=32,
    found_semicolon=0x7fde65d63e40) at sql_parse.cc:5987
#11 0x000000000064cb53 in dispatch_command (command=COM_QUERY, thd=0x7fde60161ec8, packet=0x7fde6016b6b9 "DROP DATABASE IF EXISTS testdb_N",
    packet_length=32) at sql_parse.cc:1128
#12 0x000000000064df55 in do_command (thd=0x7fde60161ec8) at sql_parse.cc:798
#13 0x000000000063a712 in tp_process_event (my_thread_data=0xfb20b0) at scheduler_thread_pool.cc:303
#14 0x000000000063b182 in tp_worker_thread_main (arg=0xfb20b0) at scheduler_thread_pool.cc:638
#15 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#16 0x000000315a4e627d in clone () from /lib64/libc.so.6

The situation (and the workload that caused it) are similar to bug #48940 with the difference being that with thread_pool_size=2 , the deadlock occurs much faster, and causes the entire server to become inaccessible, and no new client connections are accepted.

How to repeat:
With the RQG, run:

$ perl runall.pl \
  --grammar=conf/WL5004_sql.yy \
  --gendata=conf/WL5004_data.zz \
  --mysqld=--thread_pool_size=2 \
  --basedir=/build/bzr/mysql-next-mr-wl5136/ \
  --mysqld=--secure-file-priv=/tmp

Please disregard all output from the test -- a deadlock should develop shortly after takeoff, and all output will cease. At that point, no new connections will be accepted so you can then use gdb to debug.

Core and binary:

http://mysql-systemqa.s3.amazonaws.com/var-bug50273.zip

Source:

revision-id: k.long@sun.com-20100111170153-w3gb6h6dukf203bo
date: 2010-01-11 10:01:53 -0700
build-date: 2010-01-12 14:58:13 +0200
revno: 2968
branch-nick: mysql-next-mr-wl5136

--thread_pool_size=1 and 4 also cause a very quick deadlock.

This bug looks very similar to a couple of bugs I've fixed in the MDL tree.
I suggest this bug is (re-)verified once the MDL and thread pool trees have been merged.
Also because the listed backtraces will be quite different once MDL code is present.

Jon, you are right. The idea of this bug is to fix the thread-pool related issues (such as the deadlock happening faster and locking out the entire server). If only the underlying deadlock is fixed, the threadpool-related issues will theoretically remain, which is not good.

cannot repeat on mysql-trunk using a concurrent workload involving create/drop database and create table, and thread pool, with many reconnections.