MySQL Bugs: #72988: Slave thread fails with MTS + GTID + replicate_same_server

Bug #72988	Slave thread fails with MTS + GTID + replicate_same_server_id
Submitted:	12 Jun 2014 15:36	Modified:	11 Dec 2014 21:09
Reporter:	Sven Sandberg	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.7	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When MTS is configured (slave_parallel_workers > 0),
and GTID is configured (gtid_mode = ON),
and replicate_same_server_id is enabled,
the slave thread stops with the following error:

'Cannot execute the current event group in the parallel mode. Encountered event Previous_gtids, relay-log name ./slave-relay-bin.000001, position 121 which prevents execution of this event group in parallel mode. Reason: the event is a part of a group that is unsupported in the parallel execution mode.'

This was introduced by WL#6559, since this configuration was not possible before (replicate_same_server_id requires that log_slave_updates=off, and log_slave_updates=off+gtid_mode=on was not allowed before WL#6559).

In addition, this causes rpl_server_uuid and rpl_replicate_same_server_id to fail in the feature tree for WL#7592.

How to repeat:
==== rpl_bug.opt ====
--replicate-same-server-id
--log-slave-updates=off

==== rpl_bug.test ====
--source include/have_gtid.inc

--connect (slave,127.0.0.1,root,,test,$SLAVE_MYPORT,)

--connection slave

eval
CHANGE MASTER TO
  MASTER_HOST = '127.0.0.1',
  MASTER_PORT = $MASTER_MYPORT,
  MASTER_USER = 'root';
SET @@GLOBAL.SLAVE_TRANSACTION_RETRIES = 0;
SET @@GLOBAL.SLAVE_PARALLEL_WORKERS = 2;
START SLAVE;

--sleep 1

query_vertical
SHOW SLAVE STATUS;

Suggested fix:
The problem is in Log_event::get_slave_worker. Before WL#6559, it was impossible that this function was invoked for a Previous_gtids_log_event, because the only Previous_gtids_log_events that exist in the relay log are those generated by the slave itself, and then they get filtered out unless replicate_same_server_id is specified. So the fix should be to make get_slave_worker skip Previous_gtids_log_event without generating the error.

The following was added to the 5.7.6 changelog with commit 4800:

When using the multi-threaded slave with GTID based replication, enabling --replicate-same-server-id caused the slave thread to stop with an error and replication could not be started. This was caused by Previous_gtids_log_event not being correctly filtered in such a setup and reaching the worker thread. The fix ensures that Previous_gtids_log_event is correctly processed by the coordinator thread.