Description:
I have a 3 node master slave setup running MySQL 5.7.21 running semi synchronous replication. My master has small load with about 30 thread doing insert operations every second.
I stopped io thread on one of the slaves and start it again after about 60 seconds. But Slave io thread is stuck in 'Connecting to master' state and Master seems to be stuck in 'Waiting to finalize termination'
Hence the replication is not happening to this slave at all.
How to repeat:
3 identical nodes setup with MySQL 5.7.21
1 master and 2 slaves configured for semisynchronous replication with these settings.
rpl_semi_sync_master_timeout=1500000
rpl_semi_sync_master_wait_for_slave_count=1
rpl_semi_sync_master_wait_no_slave=ON
multi threaded slaves are configured on slaves with these settings.
slave_parallel_workers=30
slave_preserve_commit_order=1
slave_parallel_type=LOGICAL_CLOCK
my.cnf is as below
-------------------
[mysqld]
port=3306
skip_name_resolve=1
bind_address=0.0.0.0
user=mysql
pid_file=/var/run/mysqld/mysqld.pid
socket=/var/lib/mysql/mysql.sock
server_id=516
require_secure_transport=OFF
ssl=0
log_bin=/mysql_data/binlogs/mysql-bin
expire_logs_days=21
sync_binlog=1
max_connect_errors=1000000
max_allowed_packet=16777216
max_heap_table_size=33554432
max_connections=500
max_user_connections=400
thread_cache_size=50
open_files_limit=65535
table_open_cache=2048
table_definition_cache=2048
relay_log=/mysql_data/relaylogs/relay-log
gtid_mode=ON
enforce_gtid_consistency=ON
binlog_format=MIXED
log_slave_updates=true
slave_net_timeout=60
master_info_repository=TABLE
relay_log_info_repository=TABLE
sync_master_info=10000
sync_relay_log=10000
relay_log_recovery=1
slave_parallel_workers=30
slave_preserve_commit_order=1
slave_parallel_type=LOGICAL_CLOCK
rpl_semi_sync_master_timeout=1500000
rpl_semi_sync_master_wait_for_slave_count=1
rpl_semi_sync_master_wait_no_slave=ON
relay_log_space_limit=0
max_relay_log_size=0
max_binlog_size=104857600
datadir=/mysql_data/data
general_log=ON
general_log_file=/var/log/mysql/general.log
log_error=/var/log/mysql/mysqld.log
default_storage_engine=innodb
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=1
innodb_flush_method=O_DIRECT
innodb_file_per_table=ON
innodb_log_file_size=134217728
innodb_buffer_pool_size=134217728
innodb_io_capacity=200
innodb_adaptive_hash_index=ON
innodb_lock_wait_timeout=50
log_queries_not_using_indexes=OFF
log_slow_admin_statements=OFF
log_throttle_queries_not_using_indexes=0
long_query_time=10
slow_query_log=ON
slow_query_log_file=/var/log/mysql/mysql-slowquery.log
symbolic_links=0
interactive_timeout=28800
div_precision_increment=4
sql_mode="ONLY_FULL_GROUP_BY,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
event_scheduler=OFF
Other details
On master node:
---------------
mysql> show processlist;
+-------+-----------+----------------------+--------+------------------+--------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-------+-----------+----------------------+--------+------------------+--------+---------------------------------------------------------------+------------------+
| 34528 | repl_user | 172.31.29.18:41790 | NULL | Binlog Dump GTID | 127662 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 36330 | repl_user | 172.31.31.43:36562 | NULL | Binlog Dump GTID | 123733 | Waiting to finalize termination | NULL |
| 74996 | sgroot | 13.228.235.229:63547 | testdb | Sleep | 0 | | NULL |
| 74997 | sgroot | 13.228.235.229:63554 | testdb | Sleep | 0 | | NULL |
| 74998 | sgroot | 13.228.235.229:63548 | testdb | Sleep | 0 | | NULL |
| 74999 | sgroot | 13.228.235.229:63558 | testdb | Sleep | 0 | | NULL |
| 75000 | sgroot | 13.228.235.229:63555 | testdb | Sleep | 0 | | NULL |
| 75001 | sgroot | 13.228.235.229:63559 | testdb | Sleep | 0 | | NULL |
| 75002 | sgroot | 13.228.235.229:63549 | testdb | Sleep | 0 | | NULL |
| 75003 | sgroot | 13.228.235.229:63550 | testdb | Sleep | 0 | | NULL |
| 75004 | sgroot | 13.228.235.229:63551 | testdb | Sleep | 0 | | NULL |
| 75005 | sgroot | 13.228.235.229:63552 | testdb | Sleep | 0 | | NULL |
| 75006 | sgroot | 13.228.235.229:63553 | testdb | Sleep | 0 | | NULL |
| 75007 | sgroot | 13.228.235.229:63561 | testdb | Sleep | 0 | | NULL |
| 75008 | sgroot | 13.228.235.229:63566 | testdb | Sleep | 0 | | NULL |
| 75009 | sgroot | 13.228.235.229:63567 | testdb | Sleep | 0 | | NULL |
| 75010 | sgroot | 13.228.235.229:63556 | testdb | Sleep | 0 | | NULL |
| 75011 | sgroot | 13.228.235.229:63557 | testdb | Sleep | 0 | | NULL |
| 75012 | sgroot | 13.228.235.229:63568 | testdb | Sleep | 0 | | NULL |
| 75013 | sgroot | 13.228.235.229:63563 | testdb | Sleep | 0 | | NULL |
| 75014 | sgroot | 13.228.235.229:63564 | testdb | Sleep | 0 | | NULL |
| 75015 | sgroot | 13.228.235.229:63560 | testdb | Sleep | 0 | | NULL |
| 75016 | sgroot | 13.228.235.229:63571 | testdb | Sleep | 0 | | NULL |
| 75017 | sgroot | 13.228.235.229:63565 | testdb | Sleep | 0 | | NULL |
| 75018 | sgroot
| 13.228.235.229:63562 | testdb | Sleep | 0 | | NULL |
| 75019 | sgroot | 13.228.235.229:63570 | testdb | Sleep | 0 | | NULL |
| 75020 | sgroot | 13.228.235.229:63569 | testdb | Sleep | 0 | | NULL |
| 75021 | sgroot | 13.228.235.229:63575 | testdb | Sleep | 0 | | NULL |
| 75022 | sgroot | 13.228.235.229:63574 | testdb | Sleep | 0 | | NULL |
| 75023 | sgroot | 13.228.235.229:63576 | testdb | Sleep | 0 | | NULL |
| 75024 | sgroot | 13.228.235.229:63572 | testdb | Sleep | 0 | | NULL |
| 75025 | sgroot | 13.228.235.229:63573 | testdb | Sleep | 0 | | NULL |
| 75178 | sgroot | localhost | NULL | Query | 0 | starting | show processlist |
+-------+-----------+----------------------+--------+------------------+--------+---------------------------------------------------------------+------------------+
On slave
----------
mysql> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Connecting to master
Master_Host: 172.31.24.93
Master_User: repl_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000006
Read_Master_Log_Pos: 6654564
Relay_Log_File: relay-log.000005
Relay_Log_Pos: 6654737
Relay_Master_Log_File: mysql-bin.000006
Slave_IO_Running: Connecting
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 6654564
Relay_Log_Space: 6655025
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1045
Last_IO_Error: error connecting to master 'repl_user@172.31.24.93:3306' - retry-time: 60 retries: 18
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 516
Master_UUID: 638c305e-37af-11e9-8ccc-02f470dc105a
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp: 190226 16:40:17
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 638c305e-37af-11e9-8ccc-02f470dc105a:1-369439
Executed_Gtid_Set: 638c305e-37af-11e9-8ccc-02f470dc105a:1-369439,
839f2ca6-37af-11e9-8dea-029f7f97298c:1-33206
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
ERROR:
No query specified