MySQL Bugs: #89375: Parallel replication always fails with specific workload from sysbench

Bug #89375	Parallel replication always fails with specific workload from sysbench
Submitted:	24 Jan 2018 10:25	Modified:	11 Mar 2022 14:37
Reporter:	Frederic Descamps	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.7.21	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
When I want to illustrate how point-in-time recovery can benefit from parallel replication it always fails with this error (whatever the parallel_worker value is > 1):

Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./imac2-relay-bin.000003, position 190 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly.. 

The point-in-time recovery technique comes from this blog post: http://lefred.be/content/howto-make-mysql-point-in-time-recovery-faster/

How to repeat:
Data creation:
--------------
sysbench /usr/share/sysbench/oltp_insert.lua --db-driver=mysql 
            --mysql-user=root --mysql-host=localhost --table-size=100000 
            --tables=16 prepare

Backup:
-------
    /opt/mysql/meb-4.1/bin/mysqlbackup --host=127.0.0.1 
                        --backup-dir=mysqlbackup backup
    /opt/mysql/meb-4.1/bin/mysqlbackup --host=127.0.0.1 
                        --backup-dir=mysqlbackup apply-log

Post Backup Workload: (8 threads during 1h)
---------------------
    sysbench /usr/share/sysbench/oltp_insert.lua --db-driver=mysql 
             --mysql-user=root --mysql-host=localhost --table-size=100000 
             --tables=16 --threads=8 --report-interval=10 --time=3600 run

The problem we want to bypass:
-----------------------------
mysql> update sbtest4 set pad="oups";
Query OK, 1066269 rows affected (8.52 sec)
Rows matched: 1066269  Changed: 1066269  Warnings: 0

Finding what we need to avoid after restore:
--------------------------------------------
mysql> show master status;
+------------------+-----------+...+------------------------------------------------+
| File             | Position  |   | Executed_Gtid_Set                              |
+------------------+-----------+...+------------------------------------------------+
| imac2-bin.000005 | 622915384 |   | 25d97ef9-005b-11e8-bf1b-685b359e77d5:1-1072032 |
+------------------+-----------+...+------------------------------------------------+
1 row in set (0.00 sec)

Verification of the event:

mysql> pager grep -A 1 -B 2 'sbtest.sbtest4' | grep -B 4 Update
mysql> show binlog events in 'imac2-bin.000005';
--
| imac2-bin.000005 | 277145452 | Gtid        | ... | 
           SET @@SESSION.GTID_NEXT= '25d97ef9-005b-11e8-bf1b-685b359e77d5:1072032' |
| imac2-bin.000005 | 277145513 | Query       | ... | BEGIN                         |
| imac2-bin.000005 | 277145583 | Table_map   | ... | table_id: 154 (sbtest.sbtest4)|
| imac2-bin.000005 | 277145638 | Update_rows | ... | table_id: 154                 |
mysql> pager sed -n -e '/25d97ef9-005b-11e8-bf1b-685b359e77d5:1072032/,/COMMIT/ p' | grep COMMIT
mysql> show binlog events in 'imac2-bin.000005';
| imac2-bin.000005 | 622915357 | Xid | ... | 622915384 | COMMIT /* xid=1072253 */

Stop MySQL and save the binlogs:
--------------------------------

[root@imac2 ~]# systemctl stop mysqld
[root@imac2 ~]# mkdir /mnt/mysql/binlogs
[root@imac2 ~]# cd /var/lib/mysql
[root@imac2 mysql]# cp imac2-bin.* /mnt/mysql/binlogs/

Restore the backup:
-------------------
[root@imac2 mysql]# rm -rf *
[root@imac2 mysql]# /opt/mysql/meb-4.1/bin/mysqlbackup 
                     --backup-dir=/tmp/mysqlbackup copy-back
...
mysqlbackup completed OK! with 3 warnings
[root@imac2 mysql]# chown -R mysql. *

Find and Set the last executed GTID during backup:
--------------------------------------------------
[root@imac2 mysql]# cat backup_variables.txt | grep gtid
gtid_executed=25d97ef9-005b-11e8-bf1b-685b359e77d5:1-6001

my.cnf modification before restart:
-----------------------------------
replicate-same-server-id=1
skip-slave-start

Tell MySQL to bypass the transaction:
-------------------------------------
mysql> reset master;
mysql> SET @@GLOBAL.GTID_PURGED='25d97ef9-005b-11e8-bf1b-685b359e77d5:1-6001,
                                 25d97ef9-005b-11e8-bf1b-685b359e77d5:1072032';

Generate the relay logs:
------------------------

[root@imac2 mysql]# for i in $(ls /mnt/mysql/binlogs/*.0*) 
 do
   ext=$(echo $i | cut -d'.' -f2) 
   cp $i imac2-relay-bin.$ext
 done

[root@imac2 mysql]# ls ./imac2-relay-bin.0* >imac2-relay-bin.index
[root@imac2 mysql]# chown mysql. *relay*
[root@imac2 mysql]# cat imac2-relay-bin.index 
./imac2-relay-bin.000001
./imac2-relay-bin.000002
./imac2-relay-bin.000003
./imac2-relay-bin.000004
./imac2-relay-bin.000005

Start PITR:
-----------

mysql> SET GLOBAL SLAVE_PARALLEL_TYPE='LOGICAL_CLOCK';
mysql> SET GLOBAL SLAVE_PARALLEL_WORKERS=8;
mysql> CHANGE MASTER TO RELAY_LOG_FILE='imac2-relay-bin.000001', 
       RELAY_LOG_POS=1, MASTER_HOST='dummy';
mysql> START SLAVE SQL_THREAD;

And then the problem:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: dummy
                  Master_User: 
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: imac2-relay-bin.000002
                Relay_Log_Pos: 1074031837
        Relay_Master_Log_File: imac2-bin.000003
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1755
                   Last_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./imac2-relay-bin.000003, position 190 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1074031837
              Relay_Log_Space: 2770705356
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1755
               Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./imac2-relay-bin.000003, position 190 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 0
                  Master_UUID: 
             Master_Info_File: /mnt/mysql/datadir/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 180124 11:13:19
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 25d97ef9-005b-11e8-bf1b-685b359e77d5:1-1072033
            Executed_Gtid_Set: 25d97ef9-005b-11e8-bf1b-685b359e77d5:1-6001:1072032
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

I've tried to set SLAVE_PARALLEL_WORKERS to 1 and also set the RELAY_LOG_POS to 0, always the same problem.

If I don't set SLAVE_PARALLEL_TYPE='LOGICAL_CLOCK' and SLAVE_PARALLEL_WORKERS >0 it always fails.

mysql-bug-data-89375.zip has been uploaded with Sven's requested files

Workaround found by bill qu : change server_id and don't use replicate-same-server-id=1

for info in the error log we could see:

2018-02-01T08:00:03.350670Z 3 [ERROR] Transaction's sequence number is inconsistent with that of a preceding one: sequence_number (1) <= previous sequence_number (2122)
2018-02-01T08:00:03.350924Z 3 [Warning] Slave SQL for channel '': Coordinator thread of multi-threaded slave is being stopped in the middle of assigning a group of events; deferring to exit until the group completion ... , Error_code: 0

Posted by developer:
 
Changelog entry added for MySQL 8.0.30 and MySQL 5.7.39:

When the --replicate-same-server-id option was used to make the replica not skip events that have its own server ID, if the log file was rotated, replication stopped with an error. The log rotation event now checks and applies the current value of the option.