MySQL Bugs: #109133: Parallel replication causes massive delays

Bug #109133	Parallel replication causes massive delays
Submitted:	18 Nov 2022 3:45	Modified:	16 Oct 2023 22:52
Reporter:	chengqing hu	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Documentation	Severity:	S3 (Non-critical)
Version:	8.0	OS:	Any
Assigned to:	Jon Stephens	CPU Architecture:	Any

Description:
A small table with 100 rows without any index has a large delay when parallel replication is turned on, and the delay disappears soon after parallel replication is turned off。

How to repeat:
1. Sysbench creates a table with 100 rows;
sysbench oltp_update_index.lua --mysql-host=10.186.xx.xx --mysql-port=xx --mysql-user=xx --mysql-password=xx --mysql-db=test --tables=1 --table-size=100 --report-interval=3 --threads=1 prepare

2. Delete the index of the table on the slave;
set sql_log_bin=0;
alter table sbtest1 modify id int NOT NULL;
alter table sbtest1 drop primary key;
alter table sbtest1 drop index k_1;

3. The master enables write_set to enhance the effect of parallel replication
set global binlog_transaction_dependency_tracking=WRITESET;

4. Start parallel replication
set global replica_parallel_workers=8;
stop slave;
start slave;

5. Run sysbench, oltp_update_index mode, 8 threads
sysbench /usr/local/share/sysbench/oltp_update_index.lua -mysql-host=10.186.xx.xx --mysql-port=xxx-mysql-user=xxx --mysql-password=xxx --mysql-db=test --tables=1 --table-size=100 --report-interval=3 --threads=8 --time=100 run

6. Show slave status, you can see that the delay is getting bigger and bigger,and Executed_Gtid_Set grows very slowly

show slave status\G
sleep 60
show slave status\G

7. With parallel copy turned off, playback is fast and lag disappears quickly
set global replica_parallel_workers=0;
stop slave;
start slave;

Suggested fix:
Adding an index would do the trick, but even a full table scan is fast for such a small table and shouldn't cause large delays.

Hi,

> without any index

This is not a bug. Please add primary key to the table.

Thank you for your interest in MySQL.

1. This is just a small table
2. Turn off parallel replication without any delay
Seems to be related to parallel replication, not primary key

Hello. We have similar behavior on a database with intensive writing. Replication works fine with a single thread, with a multithreaded type, a gradual lag of the replica begins. All tables have PK. 

Additionally, I want to note that in the multi-threaded replication code, for each rpl_rca_wkr, the thread number is always zero, which makes debugging between threads somewhat difficult. I attach flamegraphs for analysis.

Single thread replica

Attachment: flamegraph_mth.svg (image/svg+xml, text), 954.56 KiB.

Single thread replica

Attachment: flamegraph_single.svg (image/svg+xml, text), 1.10 MiB.

pt-pmp profilig on mth replica

Attachment: pt-pmp.svg (image/svg+xml, text), 105.96 KiB.

Hi Max,

What version of MySQL Server are you using?

Hi Max,

Additionally, this "no index" situation I can reproduce but "all tables have PK" situation I cannot reproduce. I see your measurements but I need a way to reproduce the problem, assuming you can reproduce this with 8.0.31

Thanks

Now we use 8.0.30. Upgrade to 31 is not possible now

Hi Max,

I do not think there is anything between .30 and .31 wrt this so .30 is ok, but I cannot reproduce this. Can you provide a reproducible test case

kind regards

I tried to reproduce using
https://github.com/Percona-Lab/sysbench-tpcc

hw: 28 vcpu, 167G ram, nvme raid10, source on host, replica in docker.
0. change mysql vars in contaner: set global sync_binlog=1024;  set global innodb_buffer_pool_size=1342177280*4; set global innodb_flush_log_at_trx_commit=2; set global innodb_io_capacity_max = 400000; set global innodb_io_capacity = 200000; set global innodb_io_capacity_max = 400000; set global innodb_io_capacity = 200000;

1. ./tpcc.lua --mysql-socket=/var/run/mysqld/mysqld.sock --mysql-user=test --mysql-password=еуые --mysql-db=test --time=300 --threads=4 --report-interval=1 --tables=1 --scale=10 --db-driver=mysql prepare

2. setup replica from host to container, do: stop replica; set global replica_parallel_workers=0; start replica;

3. the next step is to determine the number of threads with which the replica will not lag behind in single-threaded mode on your hardware, in my case it is 5 threads.
./tpcc.lua --mysql-socket=/var/run/mysqld/mysqld.sock --mysql-user=test --mysql-password=tezt --mysql-db=test --time=600 --threads=5 --report-interval=1 --tables=1 --scale=10 --db-driver=mysql run

run the script and wait for a minute and make sure that the replica is not more than 1 second behind the source.

4. change threads num: stop replica; set global replica_parallel_workers=8; start replica;

now wait a minute and see that replication is starting to lag a little, there will be a picture in flamegraph as I showed above.

If return the value replica_parallel_workers=0, then the replica will start catching up with a source

Hi Max, Chengqing,

I discussed this with replication team as I'm having issues reproducing this. Even the original report with key-less table should not happen.

The flame graphs are interesting, but would be useful to get a bit more details, and do it slightly more controlled:
- If possible, use just START REPLICA IO_THREAD while the workload is executing on the source. Once everything is received on the replica, run START REPLICA SQL_THREAD, and measure the time to catch up.
- Run the same test in four cases: (1) multi-threaded with index, (2) multi-threaded without index, (3) single-threaded with index, (4) single-threaded without index.
So the result of this would be four flame graphs and four measured time durations.

That may not solve the problem, but give some idea about the area of the code that is to blame, which may direct future testing.

Thanks

Hello! I ran the recommended tests.

Attachment: flamegraphs.zip (application/x-zip-compressed, text), 367.58 KiB.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Hi Team.

Do you have any updates?

Hi,

No updates yet, the bug is now with replication team.

Thanks

Posted by developer:
 
The replication team analyzed attached flame graphs and also generated some new ones. As stated in the beginning of the discussion thread it is not recommended to remove primary keys or indexes on replica. In such case transactions which are executed on source and block some rows while re-executed on replica may lock other sets of rows. This, in turn, may lead to transactions being rolled back on replica and retired. In the long run leading to performance degradation, which is observed in the bug report. Right now it is limitation of the replication process which will be explained in documentation more clearly.

Posted by developer:
 
The replication team analyzed attached flame graphs and also generated some new ones. As stated in the beginning of the discussion thread it is not recommended to remove primary keys or indexes on replica. In such case transactions which are executed on source and block some rows while re-executed on replica may lock other sets of rows. This, in turn, may lead to transactions being rolled back on replica and retired. In the long run leading to performance degradation, which is observed in the bug report. Right now it is limitation of the replication process which will be explained in documentation more clearly.

Posted by developer:
 
The replication team analyzed attached flame graphs and also generated some new ones. As stated in the beginning of the discussion thread it is not recommended to remove primary keys or indexes on replica. In such case transactions which are executed on source and block some rows while re-executed on replica may lock other sets of rows. This, in turn, may lead to transactions being rolled back on replica and retired. In the long run leading to performance degradation, which is observed in the bug report. Right now it is limitation of the replication process which will be explained in documentation more clearly.

Posted by developer:
 
The replication team analyzed attached flame graphs and also generated some new ones. As stated in the beginning of the discussion thread it is not recommended to remove primary keys or indexes on replica. In such case transactions which are executed on source and block some rows while re-executed on replica may lock other sets of rows. This, in turn, may lead to transactions being rolled back on replica and retired. In the long run leading to performance degradation, which is observed in the bug report. Right now it is limitation of the replication process which will be explained in documentation more clearly.

Posted by developer:

Adding to Sławomir's analysis:

Limitation: Tables without unique keys always have a negative impact on replica performance. It may be even worse when using parallel applier with replica-preserve-commit-order=ON.

Reason: When there is a unique key, the replica applier will use that to find rows to update or delete. Since the key is unique, it will directly locate the correct row. Then there is no problem. But if there are only non-unique keys, there may be multiple rows for the same index entry. The replica then has to iterate over several candidate rows and compare the full row, in order to determine which row to update or delete. While doing so, in order to preserve isolation level requirements, it has to put read locks on rows it has searched. These read locks may conflict with other transactions, even if the actual updates of those other transactions are non-conflicting. This may lead to contention on row locks between workers scheduled in parallel. It is then possible that a transaction trx1 occurring earlier in the stream has to wait for a transaction trx2 occurring later in the stream. Due to replica-preserve-commit-order, trx2 will also wait for trx1 as soon as trx2 reaches commit. Therefore this leads to a deadlock. The server will detect the deadlock, rollback trx2, and retry trx2. This happens only when trx2 has executed up to the commit. So the entire transaction trx2 may execute more than once. While executing, it may block trx1. This can have a high CPU cost.

Hello team.

Good comment from the developer.

I wanted to remind you once again that in the multi-threaded replication code for each thread with the name rpl_rca_wkr, have thread number is always zero, which makes it somewhat difficult to debug between threads. It would be great to fix it all together.

Fixed in all versions of the Manual 8.0+ in mysqldoc rev 76975.

The changes will appear eventually at

https://dev.mysql.com/doc/refman/8.0/en/replication-options-replica.html#sysvar_replica_pa...

Thanks!

j..