MySQL Bugs: #70237: the mysqladmin shutdown hangs

Bug #70237	the mysqladmin shutdown hangs
Submitted:	4 Sep 2013 5:43	Modified:	10 Dec 2018 13:23
Reporter:	zhai weixiang (OCA)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.5,5.6	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Recently We got a hang while executing mysqladmin shutdown, and bellow is the output of pt-pmp:

o_handler_thread,start_thread,clone
      1 sigwait,signal_hand,start_thread,clone
      1 select,os_thread_sleep,srv_LRU_dump_restore_thread,start_thread,clone
      1 pthread_cond_wait,os_cond_wait,os_event_wait_low,srv_purge_thread,start_thread,clone
      1 pthread_cond_wait,os_cond_wait,os_event_wait_low,srv_master_thread,start_thread,clone
      1 pthread_cond_wait,inline_mysql_cond_wait,mysqld_main,__libc_start_main,_start
      1 pthread_cond_wait,inline_mysql_cond_wait,MYSQL_BIN_LOG::wait_for_update_bin_log,mysql_binlog_send,dispatch_command,do_handle_one_connection,handle_one_connection,start_thread,clone
      1 pthread_cond_wait,inline_mysql_cond_wait,close_connections,kill_ser
ver,kill_server_thread,start_thread,clone
      1 pthread_cond_timedwait,os_cond_wait_timed,os_event_wait_time_low,srv_monitor_thread,start_thread,clone
      1 pthread_cond_timedwait,os_cond_wait_timed,os_event_wait_time_low,srv_lock_timeout_thread,start_thread,clone
      1 pthread_cond_timedwait,os_cond_wait_timed,os_event_wait_time_low,srv_error_monitor_thread,start_thread,clone

How to repeat:
1):

A BINLOG_DUMP request from client (Use  C API  and make sure @ master_heartbeat_period was not set, also make sure we are requesting the newest binlog position, so then the thread will enter into "case LOG_READ_EOF" in function mysql_binlog_send)

2) add some codes before mysql_mutex_lock(log_lock) such as sleep(60) to make the thread stall

quoted code in function mysql_binlog_send(5.6.13)

1453           binary log.  We can avoid the following read if the counter
1454           has not been updated since last read.
1455         */
1456
1457         mysql_mutex_lock(log_lock);
1458         switch (error= Log_event::read_log_event(&log, packet, (mysql_mutex_t*) 0,
1459                                                  current_checksum_alg)) {

3) executing mysqladmin shutdown,  and the thread will wait for COND_thread_count

quoted code in function close_connections

1457   while (get_thread_count() > 0)
1458   {
1459     mysql_cond_wait(&COND_thread_count, &LOCK_thread_count);
1460     DBUG_PRINT("quit", ("One thread died (count=%u)", get_thread_count()));
1461   }

4) the BINLOG DUMP thread will wait for the update of binlog and the state of THD was not checked here.

Suggested fix:
I don't know.

a simple patch to workaround the hang.(based on 5.6.13)

Index: sql/rpl_master.cc
===================================================================
--- sql/rpl_master.cc   (revision 4367)
+++ sql/rpl_master.cc   (working copy)
@@ -1498,6 +1498,11 @@
           PSI_stage_info old_stage;
           signal_cnt= mysql_bin_log.signal_cnt;

+          if (thd->killed) {
+            mysql_mutex_unlock(log_lock);
+            goto end;
+          }
+
           do
           {
             if (heartbeat_period != 0)

can anyone confirm this bug ? If you need more information , please let me know.

kill_server_thread() is waiting thread_count to be decreased to 0 in close_connections(), while mysql_binlog_send() is waiting new events to be arrived in binlog, that caused the deadlock.

When kill_server_thread() set kill flag and broadcasts before dump request thread endless waiting in mysql_binlog_send(), is the race condition for deadlock.

The later patch provided by yinfeng, just catch the broadcast-before-wait case:

@@ -896,7 +896,7 @@
           int ret;
           ulong signal_cnt;
 	  DBUG_PRINT("wait",("waiting for data in binary log"));
-	  if (thd->server_id==0) // for mysqlbinlog (mysqlbinlog.server_id==0)
+	  if (thd->server_id==0 || thd->killed) // for mysqlbinlog (mysqlbinlog.server_id==0)
 	  {
             mysql_mutex_unlock(log_lock);
 	    goto end;

my fix seems incorrect. should check thd->killed after ENTER_COND

Hi,

Thank you for your report.

However, I have inspected a code and this problem definitely seems to me to be fixed in 5.7 and 8.0.

Do note that there was major reorganisation of this part of code in 5.7 and 8.0, so this time look for the source in rpl_slave.cc.

Let us know if you think otherwise.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".