Bug #71047 | shutdown hang if semisync is enabled and slave abort | ||
---|---|---|---|
Submitted: | 2 Dec 2013 4:50 | Modified: | 29 Aug 2014 16:20 |
Reporter: | zhai weixiang (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.7.2, 5.7.3-m13 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[2 Dec 2013 4:50]
zhai weixiang
[2 Dec 2013 7:01]
MySQL Verification Team
Hello Weixiang, Thank you for the bug report and test case. Verified as described. Thanks, Umesh
[2 Dec 2013 7:04]
MySQL Verification Team
// 5.7.3-m13 is also affected Stacktrace from pmp 1 pthread_join,mysqld_main,__libc_start_main,_start 1 pthread_cond_wait@@GLIBC_2.3.2,wait,reset_sig_count=1),buf_dump_thread,start_thread,clone 1 pthread_cond_wait@@GLIBC_2.3.2,wait,reset_sig_count=16),srv_purge_coordinator_suspend,optimized,start_thread,clone 1 pthread_cond_wait@@GLIBC_2.3.2,inline_mysql_cond_wait,at,close_connections,optimized,pfs_spawn_thread,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,timed_wait,time_in_usec=<value,srv_monitor_thread,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,timed_wait,time_in_usec=<value,srv_error_monitor_thread,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,timed_wait,time_in_usec=<value,page_cleaner_sleep_if_needed,optimized,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,timed_wait,time_in_usec=<value,lock_wait_timeout_thread,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,timed_wait,time_in_usec=<value,ib_wqueue_timedwait,fts_optimize_thread,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,timed_wait,time_in_usec=<value,dict_stats_thread,start_thread,clone 1 pthread_cond_timedwait@@GLIBC_2.3.2,inline_mysql_cond_timedwait,wait_time=0x7fe5411cb490),ReplSemiSyncMaster::commitTrx,Binlog_storage_delegate::after_sync,call_after_sync_hook,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit_stmt,mysql_execute_command,mysql_parse,dispatch_command,handle_connection,pfs_spawn_thread,start_thread,clone 1 ??,os_aio_linux_collect,message1=0x7fe546218e58,,fil_aio_wait,io_handler_thread,start_thread,clone 1 ??,os_aio_linux_collect,message1=0x7fe545a17e58,,fil_aio_wait,io_handler_thread,start_thread,clone 1 ??,os_aio_linux_collect,message1=0x7fe545216e58,,fil_aio_wait,io_handler_thread,start_thread,clone 1 ??,os_aio_linux_collect,message1=0x7fe544a15e58,,fil_aio_wait,io_handler_thread,start_thread,clone 1 ??,os_aio_linux_collect,message1=0x7fe544214e58,,fil_aio_wait,io_handler_thread,start_thread,clone 1 ??,os_aio_linux_collect,message1=0x7fe543a13e58,,fil_aio_wait,io_handler_thread,start_thread,clone 1 ??,os_aio_linux_collect,message1=0x7fe543212e58,,fil_aio_wait,io_handler_thread,start_thread,clone 1 ??,os_aio_linux_collect,message1=0x7fe542a11e58,,fil_aio_wait,io_handler_thread,start_thread,clone 1 ??,os_aio_linux_collect,message1=0x7fe542210e58,,fil_aio_wait,io_handler_thread,start_thread,clone 1 ??,os_aio_linux_collect,message1=0x7fe541a0fe58,,fil_aio_wait,io_handler_thread,start_thread,clone 1 nanosleep,os_thread_sleep,srv_master_sleep,optimized,start_thread,clone
[3 Dec 2013 11:38]
Libing Song
Posted by developer: Weixiang, Thanks for reporting the bug. Would you please tell us if 'rpl_semi_sync_master_wait_no_slave' is ON? By default it is on, it means semisync master should wait for acknowledgement even there is no semisync slave. If it is ON, then please switch the variable to OFF and try again to see if the problem still there.
[3 Dec 2013 12:56]
zhai weixiang
Yes, I am using default value of rpl_semi_sync_master_wait_no_slave. And everything goes well if I turned off this option.
[1 Jul 2014 6:11]
Libing Song
Posted by developer: Hi Luis, Unfortunately, listen socket is already closed, the server cannot accept any connection since then. So there is no way to replicate the left events to slave. However, for the case that there is a dump thread. it will wait for all events to be replicated and then shut down. This fix doesn't impact the case.
[29 Aug 2014 16:20]
David Moss
The following was added to the 5.6.21 and 5.7.5 changelog with commit 4097: On a master that is using semisynchronous replication, where rpl_semi_sync_master_wait_no_slave is enabled and rpl_semi_sync_master_timeout is set to long timeout, killing the I/O thread could cause the server to hang on shutdown. This fix ensures that if the dump thread finds that there no semisynchronous slaves connected to the master, the setting of rpl_semi_sync_master_wait_no_slave is ignored and the shutdown proceeds correctly.
[25 Sep 2014 12:13]
Laurynas Biveinis
revno: 6112 committer: Sujatha Sivakumar <sujatha.sivakumar@oracle.com> branch nick: Bug17879675_mysql-5.6 timestamp: Mon 2014-08-11 12:06:08 +0530 message: Bug#17879675:SHUTDOWN HANG IF SEMISYNC IS ENABLED AND SLAVE ABORT Problem: ======== Step 1. enable semisync and set a big value for rpl_semi_sync_master_timeout. Step 2. kill io thread of slave, so the threads on master will keep on WAITING ACK FROM SLAVE. Step 3. shutdown the master ---hang Analysis: ======== On master when semisync is enabled the transaction thread will wait for an ack from the slave in a conditional timed wait. Since the IO thread is already killed and on master a huge timeout value is set transaction thread will continue to wait for an ack for a long time. At this time if shutdown command is issued dump thread will try to go down. But dump thread when goes down it has a check if user has enabled rpl_semi_sync_master_wait_no_slave or not. If it is enabled dump thread will go down without switching off semi sync. Transaction thread will continue to wait for an ack even when slave count drops to zero. Because of this shutdown also will wait for the transaction thread to go down and shutdown hangs. Fix: === At the time of shutdown if dump thread finds that there no semi sync slaves that are connected to master it will switch off semisync even when rpl_semi_sync_master_wait_no_slave is set to on. i.e if shutdown finds no slaves are there to provide ack it will not honour rpl_semi_sync_master_wait_no_slave and it will switch off semisync. The transaction thread when woken up realises the semi sync is switched off it will not wait any more for ack it will continue and shutdown will be complete. When semi sync is forcefully shutdown at the time of waiting for an ack a warning is written to error log to convey to users that it was forceful shutdown. There can be a scenario where dump thread is already gone and transaction thread still waits for an ack. Hence a similar fix is implemented for the waiting transaction thread. I.e at the time of shutdown the thread gets a wakeup signal and it checks if shutdown is in progress if no slaves are connected then it will switch off the semi sync.
[30 Sep 2014 12:21]
David Moss
Posted by developer: Reopened by merge, no new changelog entries needed. Set back to Closed.