Bug #120266 Replication delay when SQL thread skips large already-executed transactions
Submitted: 14 Apr 9:12
Reporter: karry zhang (OCA) Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:MySQL 8.4 OS:Any
Assigned to: CPU Architecture:Any

[14 Apr 9:12] karry zhang
Description:
When a transaction is skipped on the replica by injecting its GTID via SET gtid_next = '<uuid>:<seq>'; BEGIN; COMMIT;, the skipped transaction still causes replication delay. 

The reason for this problem is the SQL thread reads and parses every event in the transaction one by one before moving to the next, even though it already knows the GTID is executed. For large transactions, this results in significant unnecessary delay visible through Seconds_Behind_Source.

How to repeat:
-- Source: GTID enabled, binlog_format=ROW
CREATE TABLE t1 (a INT PRIMARY KEY, b LONGTEXT);

-- Replica: stop SQL thread, keep IO thread running
STOP REPLICA SQL_THREAD;

-- Source: commit a ~1GB transaction
BEGIN;
INSERT INTO t1 VALUES (1,  REPEAT('a', 100000000));
INSERT INTO t1 VALUES (2,  REPEAT('b', 100000000));
INSERT INTO t1 VALUES (3,  REPEAT('c', 100000000));
INSERT INTO t1 VALUES (4,  REPEAT('d', 100000000));
INSERT INTO t1 VALUES (5,  REPEAT('e', 100000000));
INSERT INTO t1 VALUES (6,  REPEAT('f', 100000000));
INSERT INTO t1 VALUES (7,  REPEAT('g', 100000000));
INSERT INTO t1 VALUES (8,  REPEAT('h', 100000000));
INSERT INTO t1 VALUES (9,  REPEAT('i', 100000000));
INSERT INTO t1 VALUES (10, REPEAT('j', 100000000));
COMMIT;

-- Source: commit a small transaction immediately after
INSERT INTO t1 VALUES (11, 'small');

-- Wait for IO thread to fetch all events to relay log

-- Replica: skip the large transaction via SET gtid_next
SET sql_log_bin = 0;
SET gtid_next = '<source_uuid>:<large_trx_gtid_seq>';
BEGIN; COMMIT;
SET gtid_next = AUTOMATIC;
SET sql_log_bin = 1;

-- Replica: start SQL thread and observe Seconds_Behind_Source
START REPLICA SQL_THREAD;

-- The SQL thread parses ~1GB of relay log events just to skip
-- the already-executed transaction. Seconds_Behind_Source stays
-- elevated until the skip completes. The small transaction
-- (row 11) is delayed accordingly.

Suggested fix:
When the SQL thread reads a Gtid_log_event and finds its GTID already present in gtid_executed, skip the entire transaction by seeking past it using transaction_length, instead of reading and parsing each event in the transaction one by one.