Description:
After opening semi-sync, if rpl_semi_sync_master_wait_for_slave_count > 1 , rpl_semi_sync_master_wait_no_slave= on and rpl_semi_sync_master_timeout is set very long. Executing “shutdown” command will hang when Rpl_semi_sync_master_clients < rpl_semi_sync_master_wait_for_slave_count - 1.
How to repeat:
We can reproduce the problem in 5.7.37 with following testcase.
The master opt file:
“
$SEMISYNC_PLUGIN_OPT
--force-restart
“
The slave opt file:
“
$SEMISYNC_PLUGIN_OPT
“
The cnf file:
“
!include ../my.cnf
[mysqld.1]
log-slave-updates
server-id= 1
[mysqld.2]
log-slave-updates
server-id= 2
slave_net_timeout=8
[mysqld.3]
log-slave-updates
server-id= 3
slave_net_timeout=8
[ENV]
SERVER_MYPORT_3= @mysqld.3.port
SERVER_MYSOCK_3= @mysqld.3.socket
”
The test file:
“
--source include/not_valgrind.inc
--source include/not_group_replication_plugin.inc
--source include/have_innodb.inc
--source include/have_binlog_format_row.inc
--echo #
--echo # prepare
--echo #
# server_1 is master, others are slaves.
--let $rpl_topology= 1->2, 1->3
--source include/rpl_init.inc
--let $rpl_connection_name= server_1
--source include/rpl_connection.inc
# Suppress warning:
# "Semi-sync master failed on net_flush() before waiting for slave reply"
CALL mtr.add_suppression("Semi-sync master failed on net_flush().*");
CALL mtr.add_suppression('Timeout waiting for reply of binlog');
CALL mtr.add_suppression('SEMISYNC: Forced shutdown. Some updates might not be replicated');
CALL mtr.add_suppression('of the transaction may not be synchronized to slave when server shutdown');
--echo # Enable semisync on master
--let $rpl_connection_name= server_1
--source include/rpl_connection.inc
--let $semisync_master_enabled= ON
--source include/install_semisync_master.inc
SET GLOBAL rpl_semi_sync_master_enabled= ON;
--echo # Change master variables
SET GLOBAL rpl_semi_sync_master_timeout= 20000000;
SET GLOBAL rpl_semi_sync_master_wait_no_slave= ON;
SET GLOBAL rpl_semi_sync_master_wait_for_slave_count = 2;
SET GLOBAL rpl_semi_sync_master_trace_level = 255;
--let $MYSQL_SOCKET= `SELECT @@socket`
--let $MYSQL_PORT= `SELECT @@port`
--echo # Start slave : server_2
--let $rpl_connection_name= server_2
--source include/rpl_connection.inc
--source include/install_semisync_slave.inc
SET GLOBAL rpl_semi_sync_slave_trace_level = 255;
show variables like "slave_net_timeout";
--echo # Start slave : server_3
--let $rpl_connection_name= server_3
--source include/rpl_connection.inc
--source include/install_semisync_slave.inc
SET GLOBAL rpl_semi_sync_slave_trace_level = 255;
show variables like "slave_net_timeout";
--let $rpl_connection_name= server_1
--source include/rpl_connection.inc
--source include/rpl_sync.inc
--echo # Verify ack_receiver thread is created
--let $assert_text= ack receiver thread is created;
--let $assert_cond= count(*) = 1 FROM performance_schema.threads WHERE name LIKE "%Ack_receiver"
--source include/assert.inc
CREATE TABLE t1(c1 INT);
--source include/rpl_sync.inc
--echo # Verify semisync replication works well
--let $rpl_connection_name= server_1
--source include/rpl_connection.inc
--source include/assert_semisync_master_status_on.inc
--echo #
--echo # Run
--echo #
--echo # shutdown slave : server_2
--let $rpl_connection_name= server_2
--source include/rpl_connection.inc
--source include/stop_slave_io.inc
--source include/wait_for_slave_io_to_stop.inc
--echo # Server2 io_thread is down
--echo # shutdown slave : server_3
--let $rpl_connection_name= server_3
--source include/rpl_connection.inc
--source include/stop_slave_io.inc
--source include/wait_for_slave_io_to_stop.inc
--echo # Server3 io_thread is down
--echo #
--echo # sleep 6s wait master send heartbeat
--sleep 6
--echo # shutdown slave : server_1
--let $rpl_connection_name= server_1
--source include/rpl_connection.inc
--send insert into t1 values(1)
--sleep 1
--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--exec $MYSQLADMIN -uroot -S $MYSQL_SOCKET -P $MYSQL_PORT shutdown 2>&1
--echo # Server1 is down
--let $rpl_connection_name= server_1
--source include/rpl_connection.inc
--replace_regex /end_pos: [0-9]*\) /end_pos: <pos>) /
--error 2013
--reap
--echo
--echo # restart server 1.
--let $rpl_server_number=1
--source include/rpl_start_server.inc
--let $rpl_connection_name= server_1
--source include/rpl_connection.inc
--echo # restart server 2 io_thread.
--let $rpl_connection_name= server_2
--source include/rpl_connection.inc
--source include/start_slave_io.inc
--echo # restart server 3 io_thread.
--let $rpl_connection_name= server_3
--source include/rpl_connection.inc
--source include/start_slave_io.inc
--echo #
--echo # Cleanup
--echo #
--let $rpl_connection_name= server_1
--source include/rpl_connection.inc
DROP TABLE t1;
--source include/rpl_sync.inc
--let $rpl_connection_name= server_2
--source include/rpl_connection.inc
--source include/uninstall_semisync_slave.inc
--let $rpl_connection_name= server_3
--source include/rpl_connection.inc
--source include/uninstall_semisync_slave.inc
--let $rpl_connection_name= server_1
--source include/rpl_connection.inc
--source include/uninstall_semisync_master.inc
--source include/rpl_end.inc
”
The reproduce step is as following:
1. Start a master with two slave with opening semi sync. set global slave_net_timeout=8 in slaves.
2. Execute "SET GLOBAL rpl_semi_sync_master_timeout= 20000000; SET GLOBAL rpl_semi_sync_master_wait_no_slave= ON; SET GLOBAL rpl_semi_sync_master_wait_for_slave_count = 2; SET GLOBAL rpl_semi_sync_master_trace_level = 255; “ in master.
3. Execute "CREATE TABLE t1(c1 INT);” in master to check the semi sync is work as expect.
4. Execute "stop slave io_thread; “ in both slaves;
5. Execute “insert into t1 values(1)” in master;
6. Execute “shutdown” ;
7. After 20 second, we will see the master process is still alive.
Suggested fix:
The bugfix patch is as follow: