Bug #106536 io_thread of slave db reports error when a big trx in binlog needs to be skipped
Submitted: 22 Feb 2022 8:30 Modified: 23 Feb 2022 20:21
Reporter: Brian Yue (OCA) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:8.0.25 OS:Any (rhel-7.4)
Assigned to: MySQL Verification Team CPU Architecture:Any (x86-64)
Tags: replication, slave io_thread reports error

[22 Feb 2022 8:30] Brian Yue
Description:
Hello,
  We are now using MySQL8.0.25, and we find that there is a case slave db will report error: 

```
2022-02-21T21:11:47.760438+08:00 6 [ERROR] [MY-013118] [Repl] Slave I/O for channel '': Unexpected master's heartbeat data: heartbeat is not compatible with local info; the event's data: log_file_name mysql-bin.000006*<8c>¨ì log_pos 3753884391, Error_code: MY-013118
2022-02-21T21:11:47.760530+08:00 6 [ERROR] [MY-013122] [Repl] Slave I/O for channel '': Relay log write failure: could not queue event from master, Error_code: MY-013122.
```

  And then slave io_thread stops working.
  My servers are configured like this:

```
  binlog_format = ROW
  binlog_row_image = FULL
```

How to repeat:
(1) First we prepare 2 MySQL8.0.25 servers(MySQL-1 and MySQL-2), and execute the following SQL statements on each server.

(2) init MySQL-1 and create a database rpl_db1

mysql> select "This is master", @@bind_address, @@port;
+----------------+----------------+--------+
| This is master | @@bind_address | @@port |
+----------------+----------------+--------+
| This is master | 10.229.31.45   |   6301 |
+----------------+----------------+--------+
1 row in set (0.00 sec)

mysql> reset master;
Query OK, 0 rows affected (0.01 sec)

mysql> create database rpl_db1;
Query OK, 1 row affected (0.00 sec)

(2) init MySQL-2 and create a database rpl_db2

mysql>  select "This is slave", @@bind_address, @@port;
+---------------+----------------+--------+
| This is slave | @@bind_address | @@port |
+---------------+----------------+--------+
| This is slave | 192.168.10.42  |   6310 |
+---------------+----------------+--------+
1 row in set (0.00 sec)

mysql> reset master;
Query OK, 0 rows affected (0.00 sec)

mysql> create database rpl_db2;
Query OK, 1 row affected (0.00 sec)

(3) Make MySQL-2 to replicate from MySQL-1.

# Execute on MySQL-2:

mysql> change master to master_host = '10.229.31.45', master_port = 6301, master_user = 'root', master_password = 'db10$ZTE', master_auto_position = 1, MASTER_HEARTBEAT_PERIOD = 1;
Query OK, 0 rows affected, 9 warnings (0.01 sec)

mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.06 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.229.31.45
                  Master_User: root
                  Master_Port: 6301
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 350
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 565
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 350
              Relay_Log_Space: 768
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 757071272
                  Master_UUID: 938d8052-8f34-11ec-a244-287b09c2c4dc
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 938d8052-8f34-11ec-a244-287b09c2c4dc:1
            Executed_Gtid_Set: 2159c3bc-9329-11ec-87ab-287b09c2c4dc:1,
938d8052-8f34-11ec-a244-287b09c2c4dc:1
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
       Master_public_key_path:
        Get_master_public_key: 0
            Network_Namespace:
1 row in set, 1 warning (0.01 sec)

mysql>

(4) Construct a big table rpl_db1.sbtest1 (10,000,000 records) on MySQL1 (master db) with sysbench:

[yxx_sysbench_1@localhost sysbench]$ ./sysbench --test=./db/oltp.lua --oltp-auto-inc=off --oltp-table-size=10000000 --oltp-tables-count=1 --mysql-table-engine=innodb --mysql-user=root  --mysql-password='db10$ZTE' --mysql-port=6301 --mysql-host=10.229.31.45 --mysql-db=rpl_db1 --max-requests=0 --max-time=600 --num-threads=800 --report-interval=1 prepare
sysbench 0.5:  multi-threaded system evaluation benchmark

Creating table 'sbtest1'...
Inserting 10000000 records into 'sbtest1'

(5) Make a big transaction (about 10 GB) on MySQL-1 (master db):

# Execute on MySQL-1

mysql> use rpl_db1
Database changed
mysql> show tables;
+-------------------+
| Tables_in_rpl_db1 |
+-------------------+
| sbtest1           |
+-------------------+
1 row in set (0.00 sec)

mysql> create table sbtest2 like sbtest1;
Query OK, 0 rows affected (0.01 sec)

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into sbtest2 select * from sbtest1;
Query OK, 10000000 rows affected (2 min 3.06 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

mysql> update sbtest2 set k = k+1 ;
Query OK, 10000000 rows affected (2 min 52.62 sec)
Rows matched: 10000000  Changed: 10000000  Warnings: 0

mysql> update sbtest2 set k = k+1 ;
Query OK, 10000000 rows affected (3 min 5.98 sec)
Rows matched: 10000000  Changed: 10000000  Warnings: 0

mysql> commit;
Query OK, 0 rows affected (21.09 sec)

mysql>

(6) Wait until MySQL-2 has replicated all transactions from MySQL-1:

# Execute on MySQL-1:

mysql> show master status\G
*************************** 1. row ***************************
             File: mysql-bin.000003
         Position: 196
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 938d8052-8f34-11ec-a244-287b09c2c4dc:1-4355
1 row in set (0.00 sec)

# Execute on MySQL-2:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.229.31.45
                  Master_User: root
                  Master_Port: 6301
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 196
               Relay_Log_File: relay-bin.000008
                Relay_Log_Pos: 411
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 196
              Relay_Log_Space: 701
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 757071272
                  Master_UUID: 938d8052-8f34-11ec-a244-287b09c2c4dc
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 938d8052-8f34-11ec-a244-287b09c2c4dc:1-4355
            Executed_Gtid_Set: 2159c3bc-9329-11ec-87ab-287b09c2c4dc:1,
938d8052-8f34-11ec-a244-287b09c2c4dc:1-4355
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
       Master_public_key_path:
        Get_master_public_key: 0
            Network_Namespace:
1 row in set, 1 warning (0.00 sec)

mysql> show master status\G
*************************** 1. row ***************************
             File: mysql-bin.000003
         Position: 236
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 2159c3bc-9329-11ec-87ab-287b09c2c4dc:1,
938d8052-8f34-11ec-a244-287b09c2c4dc:1-4355
1 row in set (0.00 sec)

(7) Now we stop the replication from MySQL1 to MySQL2, and construct the replication from MySQL2 to MySQL1, which means MySQL2 becomes master now:

# Execute on MySQL-2:

mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.02 sec)

mysql> reset slave all;
Query OK, 0 rows affected, 1 warning (0.00 sec)

# Execute on MySQL-1:

mysql> change master to master_host = '192.168.10.42', master_port = 6310, master_user = 'root', master_password = 'db10$ZTE', master_auto_position = 1, MASTER_HEARTBEAT_PERIOD = 1;
Query OK, 0 rows affected, 9 warnings (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.12 sec)

(8) later, let's check the slave status of MySQL-1:

# Execute on MySQL-1:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 192.168.10.42
                  Master_User: root
                  Master_Port: 6310
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 3540658345
               Relay_Log_File: relay-bin.000004
                Relay_Log_Pos: 458
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
....
			   Last_IO_Errno: 13122
               Last_IO_Error: Relay log write failure: could not queue event from master
               Last_SQL_Errno: 0
               Last_SQL_Error:
...
           Retrieved_Gtid_Set: 2159c3bc-9329-11ec-87ab-287b09c2c4dc:1
            Executed_Gtid_Set: 2159c3bc-9329-11ec-87ab-287b09c2c4dc:1,
938d8052-8f34-11ec-a244-287b09c2c4dc:1-4355

the following error message is found in error log:

```
2022-02-22T16:07:19.350633-05:00 17 [System] [MY-010562] [Repl] Slave I/O thread for channel '': connected to master 'root@192.168.10.42:6310',replication started in log 'FIRST' at position 4
2022-02-22T16:07:19.473646-05:00 18 [Note] [MY-010581] [Repl] Slave SQL thread for channel '' initialized, starting replication in log 'FIRST' at position 0, relay log '../relaylog/relay-bin.000001' position: 4
2022-02-22T16:07:21.787311-05:00 17 [ERROR] [MY-013118] [Repl] Slave I/O for channel '': Unexpected master's heartbeat data: heartbeat is not compatible with local info; the event's data: log_file_name mysql-bin.000002▒▒▒F log_pos 2879756567, Error_code: MY-013118
2022-02-22T16:07:21.787346-05:00 17 [ERROR] [MY-013122] [Repl] Slave I/O for channel '': Relay log write failure: could not queue event from master, Error_code: MY-013122
2022-02-22T16:07:21.787370-05:00 17 [Note] [MY-010570] [Repl] Slave I/O thread exiting for channel '', read up to log 'mysql-bin.000002', position 3540658345
```
[23 Feb 2022 20:21] MySQL Verification Team
Hi,

This is a duplicate of Bug #55231

Please check Bug #55231 about workarounds until the problem is fixed.