MySQL Bugs: #120281: replica_preserve_commit_order causes infinite deadlock in group commit pipeline, blocking all application writes

Bug #120281	replica_preserve_commit_order causes infinite deadlock in group commit pipeline, blocking all application writes
Submitted:	17 Apr 9:21	Modified:	17 Apr 11:32
Reporter:	Abhijith p	Email Updates:
Status:	Open	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	8.0.42	OS:	Linux (any)
Assigned to:		CPU Architecture:	Any
Tags:	Group Commit, replication

Description:
When replica_preserve_commit_order = ON with LOGICAL_CLOCK parallel replication, a permanent deadlock can occur in the binary log group commit pipeline. This deadlock blocks not only replication applier workers but ALL application threads attempting to commit on the same server, causing a complete write outage that requires a MySQL restart to resolve.

The root cause is a design interaction between the group commit leader election mechanism and the commit order preservation logic:
(1) Replication worker W2 (executing a later transaction Tx-101) finishes execution before worker W1 (executing an earlier transaction Tx-100).
(2) W2 enters the binary log group commit FLUSH stage queue first and becomes the LEADER.
(3) replica_preserve_commit_order requires Tx-100 to commit before Tx-101, so W2 (the leader) pauses waiting for W1 to commit.
(4) W1 finishes execution, enters the group commit queue, and becomes a FOLLOWER behind W2.
(5) W1 (follower) waits for W2 (leader) to process the commit batch.
(6) W2 (leader) waits for W1 (follower) to commit first.
(7) Circular dependency: permanent deadlock.

All subsequent threads (application writes, other replication workers, MySQL events) entering the group commit queue also become followers behind the stalled leader. In our case, 384 threads piled up over 35 minutes, including 344 application threads performing INSERT/UPDATE operations completely unrelated to replication.

This is distinct from previously reported bugs:
(1) Bug #103636 (sequence ticket overflow, fixed in 8.0.28) — not applicable, our uptime was short
(2) Bug #95863 (triggered by SET GLOBAL super_read_only/FLUSH TABLES WITH READ LOCK) — our deadlock occurs with pure DML workload, no admin commands
(3) Bug #107574 (triggered by changing read_only) — not applicable
The fundamental issue is that the group commit leader election has no awareness of replica_preserve_commit_order constraints. A worker that cannot yet commit (due to ordering) should not be allowed to become the group commit leader.

Environment:

MySQL 8.0.42
replica_parallel_workers = 4
replica_parallel_type = LOGICAL_CLOCK
replica_preserve_commit_order = ON
binlog_transaction_dependency_tracking = COMMIT_ORDER
innodb_flush_log_at_trx_commit = 0
sync_binlog = 0

Active-active bi-directional replication between two servers. The server experiencing the deadlock serves as both a primary (receiving application writes) and a replica (applying replicated transactions from the other server).

How to repeat:
Setup:

Two MySQL 8.0.42 servers in bi-directional replication (active-active)
Both with: replica_parallel_workers = 4, replica_parallel_type = LOGICAL_CLOCK, replica_preserve_commit_order = ON
Application performing concurrent INSERT and UPDATE operations on both servers

The deadlock is a race condition that requires:
(1) Two replication transactions assigned to different parallel workers
(2) The worker executing the later transaction (by commit order on source) finishes execution before the earlier one
(3) That later-transaction worker enters the group commit queue and becomes leader before the earlier one finishes
Under sustained write load, this race condition occurs frequently. In our environment it occurred once on an average every day.

Minimal setup to reproduce:
Single primary, single replica (active-passive)
Primary: any workload generating concurrent transactions (e.g., sysbench oltp_read_write with --threads=8)

Replica configured with:
replica_parallel_workers = 4
replica_parallel_type = LOGICAL_CLOCK
replica_preserve_commit_order = ON

No application writes on the replica are needed. The deadlock occurs purely between replication applier workers in the group commit pipeline. Under sustained write load on the primary, the replica will eventually enter a state where all applier workers are stuck in waiting for handler commit / Waiting for dependent transaction to commit indefinitely.

In an active-active setup, the additional impact is that all application threads on the affected server are also blocked behind the same stalled group commit leader, causing a complete write outage.

Suggested fix:
The thread waits outside the group commit queue until its ordering constraint is satisfied. Only then does it enter. By the time it joins the queue, it's guaranteed to be eligible to commit. The group commit pipeline never sees ineligible threads.

Workaround:
SET GLOBAL replica_preserve_commit_order = OFF;

Use the workaround only if you do not need to read consistent data. On rare occassions, when this bug triggers with replica_preserve_commit_order = ON, you may read a data that is inconsistent.