Bug #89375 Parallel replication always fails with specific workload from sysbench
Submitted: 24 Jan 10:25 Modified: 2 Feb 10:39
Reporter: Frederic Descamps Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.21 OS:Linux
Assigned to: CPU Architecture:Any

[24 Jan 10:25] Frederic Descamps
Description:
When I want to illustrate how point-in-time recovery can benefit from parallel replication it always fails with this error (whatever the parallel_worker value is > 1):

Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./imac2-relay-bin.000003, position 190 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly.. 

The point-in-time recovery technique comes from this blog post: http://lefred.be/content/howto-make-mysql-point-in-time-recovery-faster/

How to repeat:
Data creation:
--------------
sysbench /usr/share/sysbench/oltp_insert.lua --db-driver=mysql 
            --mysql-user=root --mysql-host=localhost --table-size=100000 
            --tables=16 prepare

Backup:
-------
    /opt/mysql/meb-4.1/bin/mysqlbackup --host=127.0.0.1 
                        --backup-dir=mysqlbackup backup
    /opt/mysql/meb-4.1/bin/mysqlbackup --host=127.0.0.1 
                        --backup-dir=mysqlbackup apply-log

Post Backup Workload: (8 threads during 1h)
---------------------
    sysbench /usr/share/sysbench/oltp_insert.lua --db-driver=mysql 
             --mysql-user=root --mysql-host=localhost --table-size=100000 
             --tables=16 --threads=8 --report-interval=10 --time=3600 run

The problem we want to bypass:
-----------------------------
mysql> update sbtest4 set pad="oups";
Query OK, 1066269 rows affected (8.52 sec)
Rows matched: 1066269  Changed: 1066269  Warnings: 0

Finding what we need to avoid after restore:
--------------------------------------------
mysql> show master status;
+------------------+-----------+...+------------------------------------------------+
| File             | Position  |   | Executed_Gtid_Set                              |
+------------------+-----------+...+------------------------------------------------+
| imac2-bin.000005 | 622915384 |   | 25d97ef9-005b-11e8-bf1b-685b359e77d5:1-1072032 |
+------------------+-----------+...+------------------------------------------------+
1 row in set (0.00 sec)

Verification of the event:

mysql> pager grep -A 1 -B 2 'sbtest.sbtest4' | grep -B 4 Update
mysql> show binlog events in 'imac2-bin.000005';
--
| imac2-bin.000005 | 277145452 | Gtid        | ... | 
           SET @@SESSION.GTID_NEXT= '25d97ef9-005b-11e8-bf1b-685b359e77d5:1072032' |
| imac2-bin.000005 | 277145513 | Query       | ... | BEGIN                         |
| imac2-bin.000005 | 277145583 | Table_map   | ... | table_id: 154 (sbtest.sbtest4)|
| imac2-bin.000005 | 277145638 | Update_rows | ... | table_id: 154                 |
mysql> pager sed -n -e '/25d97ef9-005b-11e8-bf1b-685b359e77d5:1072032/,/COMMIT/ p' | grep COMMIT
mysql> show binlog events in 'imac2-bin.000005';
| imac2-bin.000005 | 622915357 | Xid | ... | 622915384 | COMMIT /* xid=1072253 */

Stop MySQL and save the binlogs:
--------------------------------

[root@imac2 ~]# systemctl stop mysqld
[root@imac2 ~]# mkdir /mnt/mysql/binlogs
[root@imac2 ~]# cd /var/lib/mysql
[root@imac2 mysql]# cp imac2-bin.* /mnt/mysql/binlogs/

Restore the backup:
-------------------
[root@imac2 mysql]# rm -rf *
[root@imac2 mysql]# /opt/mysql/meb-4.1/bin/mysqlbackup 
                     --backup-dir=/tmp/mysqlbackup copy-back
...
mysqlbackup completed OK! with 3 warnings
[root@imac2 mysql]# chown -R mysql. *

Find and Set the last executed GTID during backup:
--------------------------------------------------
[root@imac2 mysql]# cat backup_variables.txt | grep gtid
gtid_executed=25d97ef9-005b-11e8-bf1b-685b359e77d5:1-6001

my.cnf modification before restart:
-----------------------------------
replicate-same-server-id=1
skip-slave-start

Tell MySQL to bypass the transaction:
-------------------------------------
mysql> reset master;
mysql> SET @@GLOBAL.GTID_PURGED='25d97ef9-005b-11e8-bf1b-685b359e77d5:1-6001,
                                 25d97ef9-005b-11e8-bf1b-685b359e77d5:1072032';

Generate the relay logs:
------------------------

[root@imac2 mysql]# for i in $(ls /mnt/mysql/binlogs/*.0*) 
 do
   ext=$(echo $i | cut -d'.' -f2) 
   cp $i imac2-relay-bin.$ext
 done

[root@imac2 mysql]# ls ./imac2-relay-bin.0* >imac2-relay-bin.index
[root@imac2 mysql]# chown mysql. *relay*
[root@imac2 mysql]# cat imac2-relay-bin.index 
./imac2-relay-bin.000001
./imac2-relay-bin.000002
./imac2-relay-bin.000003
./imac2-relay-bin.000004
./imac2-relay-bin.000005

Start PITR:
-----------

mysql> SET GLOBAL SLAVE_PARALLEL_TYPE='LOGICAL_CLOCK';
mysql> SET GLOBAL SLAVE_PARALLEL_WORKERS=8;
mysql> CHANGE MASTER TO RELAY_LOG_FILE='imac2-relay-bin.000001', 
       RELAY_LOG_POS=1, MASTER_HOST='dummy';
mysql> START SLAVE SQL_THREAD;

And then the problem:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: dummy
                  Master_User: 
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: imac2-relay-bin.000002
                Relay_Log_Pos: 1074031837
        Relay_Master_Log_File: imac2-bin.000003
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1755
                   Last_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./imac2-relay-bin.000003, position 190 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1074031837
              Relay_Log_Space: 2770705356
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1755
               Last_SQL_Error: Cannot execute the current event group in the parallel mode. Encountered event Gtid, relay-log name ./imac2-relay-bin.000003, position 190 which prevents execution of this event group in parallel mode. Reason: The master event is logically timestamped incorrectly..
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 0
                  Master_UUID: 
             Master_Info_File: /mnt/mysql/datadir/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 180124 11:13:19
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 25d97ef9-005b-11e8-bf1b-685b359e77d5:1-1072033
            Executed_Gtid_Set: 25d97ef9-005b-11e8-bf1b-685b359e77d5:1-6001:1072032
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

I've tried to set SLAVE_PARALLEL_WORKERS to 1 and also set the RELAY_LOG_POS to 0, always the same problem.

If I don't set SLAVE_PARALLEL_TYPE='LOGICAL_CLOCK' and SLAVE_PARALLEL_WORKERS >0 it always fails.
[24 Jan 12:00] Frederic Descamps
mysql-bug-data-89375.zip has been uploaded with Sven's requested files
[1 Feb 8:33] Frederic Descamps
Workaround found by bill qu : change server_id and don't use replicate-same-server-id=1

for info in the error log we could see:

2018-02-01T08:00:03.350670Z 3 [ERROR] Transaction's sequence number is inconsistent with that of a preceding one: sequence_number (1) <= previous sequence_number (2122)
2018-02-01T08:00:03.350924Z 3 [Warning] Slave SQL for channel '': Coordinator thread of multi-threaded slave is being stopped in the middle of assigning a group of events; deferring to exit until the group completion ... , Error_code: 0