Bug #90919 SSL replication and SLAVE_IO thread stall
Submitted: 17 May 2018 17:03 Modified: 15 Oct 2018 3:04
Reporter: Bei Xu Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.6 OS:Red Hat
Assigned to: MySQL Verification Team CPU Architecture:Any

[17 May 2018 17:03] Bei Xu
Description:
We switched to SSL replication for all our master-slave set up recently and noticed that slave IO stalled constantly. Test case is below:
1. Enabled SSL replication, "show slave status" looks like below:
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: lva1-db52.corp.linkedin.com
                  Master_User: repl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.010795
          Read_Master_Log_Pos: 376269715
               Relay_Log_File: relay-log.000020
                Relay_Log_Pos: 267780355
        Relay_Master_Log_File: mysql-bin.010795
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 376269472
              Relay_Log_Space: 376270208
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: Yes
           Master_SSL_CA_File: /export/home/mysql/.certs/ca.pem
           Master_SSL_CA_Path: /export/home/mysql/.certs
              Master_SSL_Cert: /export/home/mysql/.certs/client-cert.pem
            Master_SSL_Cipher: 
               Master_SSL_Key: /export/home/mysql/.certs/client-key.pem
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 500520
                  Master_UUID: 27d85e2b-2dc6-11e4-b2ad-f40f1b1ebd2a
             Master_Info_File: /mnt/u001/mysql/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

Read_Master_Log_Pos is progressing, sometime later, we received below error in the log: 

"2018-05-17 14:01:12 29286 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-05-17 14:01:12 29286 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.010795' at position 898212416
2018-05-17 14:01:12 29286 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2018-05-17 14:03:15 29286 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-05-17 14:03:15 29286 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.010796' at position 490592399
2018-05-17 14:03:15 29286 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2018-05-17 14:44:57 29286 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-05-17 14:44:57 29286 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.010797' at position 996393951
2018-05-17 14:44:57 29286 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2018-05-17 14:49:10 29286 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-05-17 14:49:10 29286 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.010798' at position 704009950
2018-05-17 14:49:10 29286 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2018-05-17 14:54:33 29286 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-05-17 14:54:33 29286 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.010799' at position 426197684
2018-05-17 14:54:33 29286 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2018-05-17 14:56:28 29286 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-05-17 14:56:28 29286 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.010799' at position 924245333
2018-05-17 14:56:28 29286 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2018-05-17 14:58:50 29286 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
2018-05-17 14:58:50 29286 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.010800' at position 527064397
2018-05-17 14:58:50 29286 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information."

The replication works fine(no stall) without SSL. 

How to repeat:
stop/start replication, the issue happen again after sometime
[22 May 2018 14:47] MySQL Verification Team
Hi,

What kind of network connection you have between master and slave?

This type of behavior happens often when there is unstable connection between the hosts or with a broken ethernet card. Without SSL these issues are less visible as you don't see tcp errors and tcp retransmission's but since there's no checksum it can actually create even more problems (wrong data being replicated). With SSL there's additional safety other then encryption itself as packets have additional checksum and when a wrong one hit the target the connection is lost.

Can you please recheck your hardware and your connection quality, as I doubt this is a bug we are looking at here.

all best
Bogdan
[23 Jun 2018 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[10 Oct 2018 19:27] Bei Xu
network team has checked, no network issues between master and slave, please advice
[11 Oct 2018 10:12] MySQL Verification Team
Hi,
I cannot reproduce this on stable network.
What kind of network connection you have between master and slave? They are on LAN or you have dedicated connection between them or ?

thanks
Bogdan
[15 Oct 2018 3:04] Bei Xu
they are on LAN, we have many hosts and they are all connected by LAN. our network team checks, no issues from network perspective, is there another way to verify the network issues?
[23 Jul 2019 5:22] Gillian Gunson
We are seeing something like this, intermittently, on some of our heavier-writes clusters after switching to SSL-enabled replication. 

Replica status will include:

              Master_Log_File: mysql-bin.070724
...
        Relay_Master_Log_File: mysql-bin.070724
...
        Seconds_Behind_Master: 20797

(With the direct master's current binlog being a much higher number). 

A replica will be fine and then suddenly the IO thread will start lagging, while all other identical replicas are fine. In our case, restarting replication doesn't usually work, but setting slave_compressed_protocol ON does.