Description:
When working on BUG#37145, a DBUG_SYNC_POINT is added to the constructor of Query_log_event, this works fine with other DDL statements, but may cause dead lock when a DROP TABLE is killed. This only happens on 5.0, there is no such problem for 5.1 and 6.0.
Here is the explanation of the scenario:
thd1: DROP TABLE thd2: KILL QUERY thd1
-------------------------------------------------------------------------------------
thd->current_mutex= &LOCK_open;
thd->current_cond= &COND_refresh;
.
. thd->awake()
. pthread_mutex_lock(&mysys_var->mutex);
. ...
. pthread_mutex_lock(mysys_var->current_mutex);
.
DBUG_SYNC_POINT();
thd->current_mutex= &LOCK_user_locks;
thd->current_cond= &ull->cond;
pthread_cond_timedwait(&ull->cond,
&LOCK_user_locks, &abstime);
pthread_mutex_unlock(&LOCK_user_locks);
pthread_mutex_lock(&thd->mysys_var->mutex);
And now, th1 is trying to lock thd->mysys_var->mutex, which is held by thd2, and thd2 is trying to lock LOCK_open, which is held by thd1.
How to repeat:
Apply the following patch to 5.0 code:
--- sql/log_event.cc 2009-03-05 10:10:44 +0000
+++ sql/log_event.cc 2009-03-09 05:20:24 +0000
@@ -1358,6 +1358,8 @@
{
time_t end_time;
+ DBUG_SYNC_POINT("debug_lock.before_query_log_event", 10);
+
if (killed_status_arg == THD::KILLED_NO_VALUE)
killed_status_arg= thd_arg->killed;
error_code=
And then run the following test, it will cause dead lock sometimes:
---------------------------------------------------------------------
source include/master-slave.inc;
CREATE TABLE t1 (a int);
connection master;
SELECT GET_LOCK("debug_lock.before_query_log_event", 10);
connection master1;
let $connection_id= `SELECT CONNECTION_ID()`;
send DROP TABLE t1;
connection master;
eval kill query $connection_id;
SELECT RELEASE_LOCK("debug_lock.before_query_log_event");
connection master1;
reap;
-----------------------------------------------------------------------
Suggested fix:
By looking into the code, the problem is because in mysql_rm_table, it sets thd->current_mutex and thd->current_cond before calling mysql_rm_table_part2. I think one should only set these members right before the thread is about to wait for the condition, and reset it to 0 after been waken up. If some codes in between the set of these members and waiting for the condition also set these members and waiting (as DBUG_SYNC_POINT or other functions that calls thd->enter_cond() does), then KILL this thread will probably to cause a dead lock.
Move setting thd->current_mutex and thd->current_cond to 'remove_table_from_cache' (I am not sure if it is used in other places or not), right before start to waiting, and reset them to 0 after been waken up.