Bug #73531 Not all the 4-way deadlocks of bug 69954 have been fixed
Submitted: 11 Aug 2014 12:20 Modified: 12 Aug 2014 9:39
Reporter: Laurynas Biveinis (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.5, 5.5.39 OS:Any
Assigned to: CPU Architecture:Any
Tags: deadlock

[11 Aug 2014 12:20] Laurynas Biveinis
Description:
Bug 69954 fix is incomplete. 5.5 code review shows several places where LOCK_thread_count is locked, a global thread list is looped over, and some LOCK_thd_data is locked. Unless it is provable that LOCK_thd_data will not be taken for a zombie dump thread there, it's a deadlock.

How to repeat:
One such place is close_connections of mysqld.cc.

To see the deadlock (MTR is not of GA quality, but it will show deadlock all right):

=== modified file 'mysql-test/suite/rpl/t/rpl_4threads_deadlock.test'
--- mysql-test/suite/rpl/t/rpl_4threads_deadlock.test	2014-05-08 12:43:01 +0000
+++ mysql-test/suite/rpl/t/rpl_4threads_deadlock.test	2014-08-11 12:08:34 +0000
@@ -32,6 +32,8 @@
 --source include/have_binlog_format_statement.inc
 --source include/master-slave.inc
 
+--let $mysqld_pid_file=`SELECT @@GLOBAL.pid_file`
+
 connect(master2,127.0.0.1,root,,test,$MASTER_MYPORT,);
 connect(master3,127.0.0.1,root,,test,$MASTER_MYPORT,);
 --connection master
@@ -43,7 +45,7 @@
 
 # Iteration 1 with "SHOW PROCESSLIST command"
 # Iteration 2 with "SELECT * FROM information_schema.processlist"
---let $iter=2
+--let $iter=3
 while ($iter)
 {
   --connection slave
@@ -83,6 +85,16 @@
   {
     --let $query=SELECT * FROM information_schema.processlist
   }
+  if ($iter == 3)
+  {
+    system kill -TERM `cat $mysqld_pid_file`;
+    --connection slave
+    select SLEEP(10000);
+#    --source include/wait_until_disconnected.inc
+#    --enable_reconnect
+#    --exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
+#    --source include/wait_until_connected_again.inc
+  }
   --send_eval $query
 
   --connection master

Now run the testcase, give it some time, attach to the master server and here's the Thread 4 to use the bug 69954 commit message thread terminology:

  thread #22: tid = 0x526e6d, 0x00007fff997cf746 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #0: 0x00007fff997cf746 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff9607f779 libsystem_pthread.dylib`_pthread_mutex_lock + 372
    frame #2: 0x000000010bb1a9c8 mysqld`safe_mutex_lock(mp=0x00007fbedd0060b8, try_lock=<unavailable>, file=0x000000010bf9a929, line=1128) + 168 at thr_mutex.c:152
    frame #3: 0x000000010b907af6 mysqld`kill_server_thread [inlined] inline_mysql_mutex_lock(st_mysql_mutex*, char const*, unsigned int) + 1382 at mysql_thread.h:615
    frame #4: 0x000000010b907a72 mysqld`kill_server_thread [inlined] close_connections() + 1125 at mysqld.cc:1128
    frame #5: 0x000000010b90760d mysqld`kill_server_thread [inlined] kill_server(sig_ptr=<unavailable>) + 100 at mysqld.cc:1325
    frame #6: 0x000000010b9075a9 mysqld`kill_server_thread(arg=<unavailable>) + 25 at mysqld.cc:1353
    frame #7: 0x000000010bc77b2f mysqld`pfs_spawn_thread(arg=0x00007fbedda024d0) + 79 at pfs.cc:1015
    frame #8: 0x00007fff9607c899 libsystem_pthread.dylib`_pthread_body + 138
    frame #9: 0x00007fff9607c72a libsystem_pthread.dylib`_pthread_start + 137

Thread 1:

thread #21: tid = 0x526e65, 0x00007fff997cf746 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #0: 0x00007fff997cf746 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff9607f779 libsystem_pthread.dylib`_pthread_mutex_lock + 372
    frame #2: 0x000000010bb1a9c8 mysqld`safe_mutex_lock(mp=0x000000010c14cb08, try_lock=<unavailable>, file=0x000000010bfaec52, line=1591) + 168 at thr_mutex.c:152
    frame #3: 0x000000010b99c9b2 mysqld`THD::awake(THD::killed_state) [inlined] inline_mysql_mutex_lock(st_mysql_mutex*, char const*, unsigned int) + 93 at mysql_thread.h:615
    frame #4: 0x000000010b99c955 mysqld`THD::awake(this=0x00007fbedd005200, state_to_set=<unavailable>) + 549 at sql_class.cc:1591
    frame #5: 0x000000010ba01bdc mysqld`kill_zombie_dump_threads(slave_server_id=<unavailable>) + 524 at sql_repl.cc:1381
    frame #6: 0x000000010b9cd8a6 mysqld`dispatch_command(command=COM_SHUTDOWN, thd=0x00007fbede02b200, packet=0x00007fbede02e201, packet_length=<unavailable>) + 3462 at sql_parse.cc:1227
    frame #7: 0x000000010b9cef68 mysqld`do_command(thd=0x00007fbede02b200) + 376 at sql_parse.cc:773
    frame #8: 0x000000010baa4006 mysqld`do_handle_one_connection(thd_arg=<unavailable>) + 390 at sql_connect.cc:862
    frame #9: 0x000000010baa3e65 mysqld`handle_one_connection(arg=0x00007fbede02b200) + 69 at sql_connect.cc:781
    frame #10: 0x000000010bc77b2f mysqld`pfs_spawn_thread(arg=0x00007fbedda024d0) + 79 at pfs.cc:1015
    frame #11: 0x00007fff9607c899 libsystem_pthread.dylib`_pthread_body + 138
    frame #12: 0x00007fff9607c72a libsystem_pthread.dylib`_pthread_start + 137

Thread 2:

  thread #18: tid = 0x526e49, 0x00007fff997cf746 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #0: 0x00007fff997cf746 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff9607f779 libsystem_pthread.dylib`_pthread_mutex_lock + 372
    frame #2: 0x000000010bb1a9c8 mysqld`safe_mutex_lock(mp=0x000000010c14d030, try_lock=<unavailable>, file=0x000000010bfa4dcd, line=512) + 168 at thr_mutex.c:152
    frame #3: 0x000000010ba0399e mysqld`show_binlogs(thd=0x00007fbedd087000) + 1022 at mysql_thread.h:611
    frame #4: 0x000000010b9d2a6b mysqld`mysql_execute_command(thd=0x00007fbedd087000) + 9675 at sql_parse.cc:2713
    frame #5: 0x000000010b9cf3f8 mysqld`mysql_parse(thd=0x00007fbedd087000, rawbuf=<unavailable>, length=<unavailable>, parser_state=<unavailable>) + 440 at sql_parse.cc:5662
    frame #6: 0x000000010b9cdb12 mysqld`dispatch_command(command=COM_SHUTDOWN, thd=0x00007fbedd087000, packet=<unavailable>, packet_length=<unavailable>) + 4082 at sql_parse.cc:1038
    frame #7: 0x000000010b9cef68 mysqld`do_command(thd=0x00007fbedd087000) + 376 at sql_parse.cc:773
    frame #8: 0x000000010baa4006 mysqld`do_handle_one_connection(thd_arg=<unavailable>) + 390 at sql_connect.cc:862
    frame #9: 0x000000010baa3e65 mysqld`handle_one_connection(arg=0x00007fbedd087000) + 69 at sql_connect.cc:781
    frame #10: 0x000000010bc77b2f mysqld`pfs_spawn_thread(arg=0x00007fbedddc10f0) + 79 at pfs.cc:1015
    frame #11: 0x00007fff9607c899 libsystem_pthread.dylib`_pthread_body + 138
    frame #12: 0x00007fff9607c72a libsystem_pthread.dylib`_pthread_start + 137

Thread 3:

  thread #19: tid = 0x526e4c, 0x00007fff997cf746 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #0: 0x00007fff997cf746 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fff9607f779 libsystem_pthread.dylib`_pthread_mutex_lock + 372
    frame #2: 0x000000010bb1a9c8 mysqld`safe_mutex_lock(mp=0x000000010c150110, try_lock=<unavailable>, file=0x000000010bfb7b5f, line=252) + 168 at thr_mutex.c:152
    frame #3: 0x000000010b9fe33d mysqld`log_in_use(char const*) [inlined] inline_mysql_mutex_lock(st_mysql_mutex*, char const*, unsigned int) + 119 at mysql_thread.h:615
    frame #4: 0x000000010b9fe2c6 mysqld`log_in_use(log_name=0x0000000110cac890) + 54 at sql_repl.cc:252
    frame #5: 0x000000010b8dc23e mysqld`MYSQL_BIN_LOG::purge_logs_before_date(this=0x000000010c14cb00, purge_time=0) + 462 at log.cc:4101
    frame #6: 0x000000010b9fe6d8 mysqld`purge_master_logs_before_date(thd=0x00007fbedd08e200, purge_time=<unavailable>) + 40 at sql_repl.cc:332
    frame #7: 0x000000010b9d4d2a mysqld`mysql_execute_command(thd=0x00007fbedd08e200) + 18570 at sql_parse.cc:2239
    frame #8: 0x000000010b9cf3f8 mysqld`mysql_parse(thd=0x00007fbedd08e200, rawbuf=<unavailable>, length=<unavailable>, parser_state=<unavailable>) + 440 at sql_parse.cc:5662
    frame #9: 0x000000010b9cdb12 mysqld`dispatch_command(command=COM_SHUTDOWN, thd=0x00007fbedd08e200, packet=<unavailable>, packet_length=<unavailable>) + 4082 at sql_parse.cc:1038
    frame #10: 0x000000010b9cef68 mysqld`do_command(thd=0x00007fbedd08e200) + 376 at sql_parse.cc:773
    frame #11: 0x000000010baa4006 mysqld`do_handle_one_connection(thd_arg=<unavailable>) + 390 at sql_connect.cc:862
    frame #12: 0x000000010baa3e65 mysqld`handle_one_connection(arg=0x00007fbedd08e200) + 69 at sql_connect.cc:781
    frame #13: 0x000000010bc77b2f mysqld`pfs_spawn_thread(arg=0x00007fbedddc10f0) + 79 at pfs.cc:1015
    frame #14: 0x00007fff9607c899 libsystem_pthread.dylib`_pthread_body + 138
    frame #15: 0x00007fff9607c72a libsystem_pthread.dylib`_pthread_start + 137

Suggested fix:
Use LOCK_thd_remove in the affected code too.
[11 Aug 2014 13:06] Laurynas Biveinis
The same issue should happen with Thread 4 doing KILL <binlog dump thread id>, but debugging the MTR debug_sync points to show this is taking more time than I can spare right now.
[12 Aug 2014 8:33] MySQL Verification Team
it would be nice to finally implement this :)
https://dev.mysql.com/worklog/task/?id=3262
[12 Aug 2014 9:39] MySQL Verification Team
Hello Laurynas,

Thank you for the bug report and test case.
Observed deadlock with the modified test case.

Thanks,
Umesh
[9 Dec 2014 19:54] Mark Callaghan
Need a special club for people who provide testcases for 4-way deadlock.
http://bugs.mysql.com/bug.php?id=60682