Bug #94476 mysql semisync replication stuck with master in Waiting to finalize termination
Submitted: 26 Feb 2019 18:44 Modified: 14 Mar 2019 17:05
Reporter: Shirish Keshava Murthy Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7.21 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[26 Feb 2019 18:44] Shirish Keshava Murthy
Description:
I have a 3 node master slave setup running MySQL 5.7.21 running semi synchronous replication. My master has small load with about 30 thread doing insert operations every second.
I stopped io thread on one of the slaves and start it again after about 60 seconds. But Slave io thread is stuck in 'Connecting to master' state and Master seems to be stuck in 'Waiting to finalize termination'
Hence the replication is not happening to this slave at all.

How to repeat:
3 identical nodes setup with MySQL 5.7.21
1 master and 2 slaves configured for semisynchronous replication with these settings.
rpl_semi_sync_master_timeout=1500000
rpl_semi_sync_master_wait_for_slave_count=1
rpl_semi_sync_master_wait_no_slave=ON
multi threaded slaves are configured on slaves with these settings.
slave_parallel_workers=30
slave_preserve_commit_order=1
slave_parallel_type=LOGICAL_CLOCK
my.cnf is as below
-------------------
[mysqld]
port=3306
skip_name_resolve=1
bind_address=0.0.0.0
user=mysql
pid_file=/var/run/mysqld/mysqld.pid
socket=/var/lib/mysql/mysql.sock
server_id=516
require_secure_transport=OFF
ssl=0
log_bin=/mysql_data/binlogs/mysql-bin
expire_logs_days=21
sync_binlog=1
max_connect_errors=1000000
max_allowed_packet=16777216
max_heap_table_size=33554432
max_connections=500
max_user_connections=400
thread_cache_size=50
open_files_limit=65535
table_open_cache=2048
table_definition_cache=2048
relay_log=/mysql_data/relaylogs/relay-log
gtid_mode=ON
enforce_gtid_consistency=ON
binlog_format=MIXED
log_slave_updates=true
slave_net_timeout=60
master_info_repository=TABLE
relay_log_info_repository=TABLE
sync_master_info=10000
sync_relay_log=10000
relay_log_recovery=1
slave_parallel_workers=30
slave_preserve_commit_order=1
slave_parallel_type=LOGICAL_CLOCK
rpl_semi_sync_master_timeout=1500000
rpl_semi_sync_master_wait_for_slave_count=1
rpl_semi_sync_master_wait_no_slave=ON
relay_log_space_limit=0
max_relay_log_size=0
max_binlog_size=104857600
datadir=/mysql_data/data
general_log=ON
general_log_file=/var/log/mysql/general.log
log_error=/var/log/mysql/mysqld.log
default_storage_engine=innodb
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=1
innodb_flush_method=O_DIRECT
innodb_file_per_table=ON
innodb_log_file_size=134217728
innodb_buffer_pool_size=134217728
innodb_io_capacity=200
innodb_adaptive_hash_index=ON
innodb_lock_wait_timeout=50
log_queries_not_using_indexes=OFF
log_slow_admin_statements=OFF
log_throttle_queries_not_using_indexes=0
long_query_time=10
slow_query_log=ON
slow_query_log_file=/var/log/mysql/mysql-slowquery.log
symbolic_links=0
interactive_timeout=28800
div_precision_increment=4
sql_mode="ONLY_FULL_GROUP_BY,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
event_scheduler=OFF
Other details
On master node:
---------------
mysql> show processlist;
+-------+-----------+----------------------+--------+------------------+--------+---------------------------------------------------------------+------------------+
| Id    | User      | Host                 | db     | Command          | Time   | State                                                         | Info             |
+-------+-----------+----------------------+--------+------------------+--------+---------------------------------------------------------------+------------------+
| 34528 | repl_user | 172.31.29.18:41790   | NULL   | Binlog Dump GTID | 127662 | Master has sent all binlog to slave; waiting for more updates | NULL             |
| 36330 | repl_user | 172.31.31.43:36562   | NULL   | Binlog Dump GTID | 123733 | Waiting to finalize termination                               | NULL             |
| 74996 | sgroot    | 13.228.235.229:63547 | testdb | Sleep            |      0 |                                                               | NULL             |
| 74997 | sgroot    | 13.228.235.229:63554 | testdb | Sleep            |      0 |                                                               | NULL             |
| 74998 | sgroot    | 13.228.235.229:63548 | testdb | Sleep            |      0 |                                                               | NULL             |
| 74999 | sgroot    | 13.228.235.229:63558 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75000 | sgroot    | 13.228.235.229:63555 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75001 | sgroot    | 13.228.235.229:63559 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75002 | sgroot    | 13.228.235.229:63549 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75003 | sgroot    | 13.228.235.229:63550 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75004 | sgroot    | 13.228.235.229:63551 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75005 | sgroot    | 13.228.235.229:63552 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75006 | sgroot    | 13.228.235.229:63553 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75007 | sgroot    | 13.228.235.229:63561 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75008 | sgroot    | 13.228.235.229:63566 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75009 | sgroot    | 13.228.235.229:63567 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75010 | sgroot    | 13.228.235.229:63556 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75011 | sgroot    | 13.228.235.229:63557 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75012 | sgroot    | 13.228.235.229:63568 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75013 | sgroot    | 13.228.235.229:63563 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75014 | sgroot    | 13.228.235.229:63564 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75015 | sgroot    | 13.228.235.229:63560 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75016 | sgroot    | 13.228.235.229:63571 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75017 | sgroot    | 13.228.235.229:63565 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75018 | sgroot   
 | 13.228.235.229:63562 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75019 | sgroot    | 13.228.235.229:63570 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75020 | sgroot    | 13.228.235.229:63569 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75021 | sgroot    | 13.228.235.229:63575 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75022 | sgroot    | 13.228.235.229:63574 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75023 | sgroot    | 13.228.235.229:63576 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75024 | sgroot    | 13.228.235.229:63572 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75025 | sgroot    | 13.228.235.229:63573 | testdb | Sleep            |      0 |                                                               | NULL             |
| 75178 | sgroot    | localhost            | NULL   | Query            |      0 | starting                                                      | show processlist |
+-------+-----------+----------------------+--------+------------------+--------+---------------------------------------------------------------+------------------+
On slave
----------
mysql> show slave status \G;
*************************** 1. row ***************************
               Slave_IO_State: Connecting to master
                  Master_Host: 172.31.24.93
                  Master_User: repl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 6654564
               Relay_Log_File: relay-log.000005
                Relay_Log_Pos: 6654737
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 6654564
              Relay_Log_Space: 6655025
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1045
                Last_IO_Error: error connecting to master 'repl_user@172.31.24.93:3306' - retry-time: 60  retries: 18
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 516
                  Master_UUID: 638c305e-37af-11e9-8ccc-02f470dc105a
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp: 190226 16:40:17
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 638c305e-37af-11e9-8ccc-02f470dc105a:1-369439
            Executed_Gtid_Set: 638c305e-37af-11e9-8ccc-02f470dc105a:1-369439,
839f2ca6-37af-11e9-8dea-029f7f97298c:1-33206
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)
ERROR:
No query specified
[28 Feb 2019 7:57] Shirish Keshava Murthy
Change in name
[7 Mar 2019 3:47] MySQL Verification Team
Hi,

You set the category to "MySQL Cluster", from what I see you are not running MySQL Cluster ( https://www.mysql.com/products/cluster/ ) but a regular MySQL Server, is that correct? I already changed the category to replication but need confirmation to be on the safe side.

Now I don't see if your 3 servers are using Group Replication ( https://dev.mysql.com/doc/refman/5.7/en/group-replication.html ) / InnoDB Cluster ( https://dev.mysql.com/doc/refman/5.7/en/mysql-innodb-cluster-userguide.html ) or this is just a plain and simple repliation? If that's so please confirm topology is one master with 2 direct semi synchronous slaves.

Continuing assuming my guesses are correct (this is a simple master to two slaves semi sync replication, no mysql cluster, no innodb cluster, no group replication), I cannot reproduce this using identical setup. This looks more like a support ticket then a bug report. Have you tried restarting the slave server?

kind regards
Bogdan
[11 Mar 2019 15:58] Shirish Keshava Murthy
Hello Bogdan,

All your assumptions are correct. We have a simple master with 2 slave semi-sync replication setup. We've also been able to reproduce this consistently and restarting the slave does not help.

Since we are able to reproduce this issue we're trying to understand the root cause and potential fixes for this. That is why we've filed it  as a bug and not a support ticket.

Can you please investigate our setup?
[14 Mar 2019 11:00] MySQL Verification Team
Hi,
Opening a support ticket with the bug report is always advicible :)

Now, I'v bin operating under some assumptions that you did confirm are correct but I was not able to reproduce this behavior.

Since you are able to reproduce this, are you reproducing this on latest 5.7.25 too? Is there a way you can provide us with a reproducible test case? Just running replication with your configuration "works".

Thanks
Bogdan
[14 Mar 2019 17:05] Shirish Keshava Murthy
We've been able to reproduce this with the following steps:
1. Its a Java application with 30 threads inserting into a simple table every 30 seconds.

2. On one of the slaves do stop slave io_thread
3. Do start slave io_thread

Then the slave will enter connecting phase with retries and is not able to receive any more data. The slave starts working fine immediately after the data loading stops on the master.