Bug #119270 Single-statement transactions can cause latency spikes on the replica
Submitted: 30 Oct 9:12
Reporter: karry zhang (OCA) Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:MySQL8.0,5.7 OS:Any
Assigned to: CPU Architecture:Any

[30 Oct 9:12] karry zhang
Description:
The replica uses MTS. The source's binlog_format=row.

Single-statement transactions can cause latency spikes on the replica, even though there is no actual latency.

How to repeat:
add debug_sync, so we can see the spike.

diff --git a/sql/binlog.cc b/sql/binlog.cc
index 2e90d0f481b..9697037adee 100644
--- a/sql/binlog.cc
+++ b/sql/binlog.cc
@@ -8232,6 +8232,10 @@ TC_LOG::enum_result MYSQL_BIN_LOG::commit(THD *thd, bool all) {
     */
     else if (real_trans && xid && trn_ctx->rw_ha_count(trx_scope) > 1 &&
              !trn_ctx->no_2pc(trx_scope)) {
+      DBUG_EXECUTE_IF("block_to_see_sbm", {
+        const char act[] = "now wait_for go_to_commit";
+        assert(!debug_sync_set_action(current_thd, STRING_WITH_LEN(act)));
+      });
       Xid_log_event end_evt(thd, xid);
       if (cache_mngr->trx_cache.finalize(thd, &end_evt)) return RESULT_ABORTED;
     }

run the following test case:

--source include/master-slave.inc
--source include/have_binlog_format_row.inc
--source include/have_debug.inc
 
--echo
--echo #
--echo # 0. Prepare data.
--echo ##################
--echo #
connection master;
CREATE DATABASE tt;
CREATE TABLE tt.t1(id int);
 
--echo
--echo #
--echo # 1. Simulate executing long transaction.
--echo #########################################
--echo #
--echo ## The rli->last_master_timestamp will be set to the first transaction's Gtid_log_event's when field 
--echo ## after sync with master.  
connection slave;
--source include/rpl_sync.inc
SET GLOBAL debug = '+d,block_to_see_sbm';
 
--echo
--echo ## Simulate long transaction.
connection master;
INSERT INTO tt.t1 VALUES(sleep(10));
 
connection slave;
 
--echo ## SBM = 10s (on the source) + x (on the replica) >=10, In fact, this transaction can be executed very quickly on the replica.

let $sbm= query_get_value("SHOW SLAVE STATUS", Seconds_Behind_Master, 1);
let $assert_text= Seconds_Behind_Master will larger than 10;
let $assert_cond= $sbm >= 10;
--source include/assert.inc
 
SET DEBUG_SYNC= "now SIGNAL go_to_commit";
SET GLOBAL debug = '-d,block_to_see_sbm';
 
--echo
--echo #
--echo # 2. Clean up 
--echo #############
--echo #
connection slave;
SET GLOBAL debug = '-d,block_to_see_sbm';
connection master;
DROP DATABASE tt;
--source include/rpl_sync.inc
--source include/rpl_end.inc

Suggested fix:
For MTS, the SBM calculation logic is replica's current time minus the source's transaction commit time. The source's transaction commit time is obtained through the `when` field in the header of `Xid_log_event`/`Gtid_log_event`.

However, for single-statement transactions, the `when` field in the header of `Xid_log_event`/`Gtid_log_event` takes the time when the transaction started executing, not the transaction commit time. This causes an SBM spike on the replica.
[30 Oct 16:03] Jean-François Gagné
> However, for single-statement transactions, the `when` field in the header of `Xid_log_event`/`Gtid_log_event` takes the time when the transaction started executing, not the transaction commit time. This causes an SBM spike on the replica.

Very interesting finding !  It might also be affecting 9.4 and 9.5.

> add debug_sync, so we can see the spike.

It is probably possible to see the spike without a patched mysqld by blocking the insert on the replica with locking the table (or locking that row).

> run the following test case
> INSERT INTO tt.t1 VALUES(sleep(10));

Another way to see this, which might be more realistic in a production environment, could be to do an auto-commit insert on a pk which is locked by another transaction (that other transaction could have inserted the same pk, but will rollback 10 seconds later; of that other trx could be a delete on that pk, which would commit 10 seconda later).