MySQL Bugs: #114934: In Heavy workload, it degenerates to use single thread replication in fact

Bug #114934	In Heavy workload, it degenerates to use single thread replication in fact
Submitted:	9 May 2024 7:33	Modified:	5 Jun 2024 2:06
Reporter:	Cai Minshen	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.7.44	OS:	Any
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
There are several hundreds of active connections on the source server. Each connection runs big transactions. The binary log size of each transaction is about 500 KB. The configuration max_binlog_size is 100M. Set up a read replica on the source server. As the time goes by, the replication lag becomes very large. According to the table performance_schema.replication_applier_status_by_worker, it only uses 1 worker thread in most time.

MySQL supports group commit. For each binary log file, there are a number of transactions which are commit in the 1st group. These transactions could be replicated in parallel. Their last_committed are all 0. 

In read replica, if the last_committed of one transaction is 0, it must replicate the transaction in sequence. Please rerfer to the below:
<<<<<<<<<<<<<<<<<<<<<<<<<<<<
int Mts_submode_logical_clock::schedule_next_event(Relay_log_info* rli, Log_event *ev) {
...
is_new_group=
    (
	...
       undefined parent (e.g the very first trans from the master),
       or old master.
     */
     last_committed == SEQ_UNINIT ||
	 ...
	 
	  if (!is_new_group)
     {
	  ...
	 }
	 else
	 {
	  ...
	  if (-1 == wait_for_workers_to_finish(rli))
	  ...
	 }
	 ...
}
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

In my use case, each transaction is big, and there are lots of active connections. There are only hundred transactions in most of my binary log files. And the last_committed of them are all 0. So MySQL degenerates to use single thread replication.

How to repeat:
There are several hundreds of active connections on the source server. Each connection runs big transactions. Each transaction is a insert with blob. The binary log size of each transaction is about 500 KB. The configuration max_binlog_size is 100M. 
Set up a read replica on the source server. slave_parallel_type is LOGICAL_CLOCK. slave_parallel_workers is 128 As the time goes by, the replication lag becomes very large. According to the table performance_schema.replication_applier_status_by_worker, it only uses 1 worker thread in most time.

Hi,

I was not able to reproduce this. Can you share more data, maybe create a script that will generate traffic that reproduces this. I made it as close as I can to your description and I was not able to recreate the problem.

Another question - what version are you using. I'm testing with 8.4.0 LTS

We use V5.7.44. 

We run 10 sysbench in parallel. Each one is target on a database. The sysbench command is like this:
sysbench /usr/share/sysbench/oltp_read_write.lua --tables=2048  --table-size=2000000000  --mysql-user=<username> --mysql-db=<db_name>--mysql-host=<servername> --mysql_password='xxx' prepare

Hi,

As of October 25, 2023, MySQL 5.7 is covered under Oracle Sustaining Support.

So there is no active development on 5.7 any more. 

Can you reproduce this with 8? I tried and I cannot.

Those parallel sysbenchs all target their own database or they target all single database?

We reproduce this issue on V8.0.36. Each sysbench is for its own database.