Bug #109133 Parallel replication causes massive delays
Submitted: 18 Nov 2022 3:45 Modified: 16 Oct 2023 22:52
Reporter: chengqing hu Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: Jon Stephens CPU Architecture:Any

[18 Nov 2022 3:45] chengqing hu
Description:
A small table with 100 rows without any index has a large delay when parallel replication is turned on, and the delay disappears soon after parallel replication is turned off。

How to repeat:
1. Sysbench creates a table with 100 rows;
sysbench oltp_update_index.lua --mysql-host=10.186.xx.xx --mysql-port=xx --mysql-user=xx --mysql-password=xx --mysql-db=test --tables=1 --table-size=100 --report-interval=3 --threads=1 prepare

2. Delete the index of the table on the slave;
set sql_log_bin=0;
alter table sbtest1 modify id int NOT NULL;
alter table sbtest1 drop primary key;
alter table sbtest1 drop index k_1;

3. The master enables write_set to enhance the effect of parallel replication
set global binlog_transaction_dependency_tracking=WRITESET;

4. Start parallel replication
set global replica_parallel_workers=8;
stop slave;
start slave;

5. Run sysbench, oltp_update_index mode, 8 threads
sysbench /usr/local/share/sysbench/oltp_update_index.lua -mysql-host=10.186.xx.xx --mysql-port=xxx-mysql-user=xxx --mysql-password=xxx --mysql-db=test --tables=1 --table-size=100 --report-interval=3 --threads=8 --time=100 run

6. Show slave status, you can see that the delay is getting bigger and bigger,and Executed_Gtid_Set grows very slowly

show slave status\G
sleep 60
show slave status\G

7. With parallel copy turned off, playback is fast and lag disappears quickly
set global replica_parallel_workers=0;
stop slave;
start slave;

Suggested fix:
Adding an index would do the trick, but even a full table scan is fast for such a small table and shouldn't cause large delays.
[18 Nov 2022 15:22] MySQL Verification Team
Hi,

> without any index

This is not a bug. Please add primary key to the table.

Thank you for your interest in MySQL.
[21 Nov 2022 2:12] chengqing hu
1. This is just a small table
2. Turn off parallel replication without any delay
Seems to be related to parallel replication, not primary key
[22 Nov 2022 10:17] Max Irgiznov
Hello. We have similar behavior on a database with intensive writing. Replication works fine with a single thread, with a multithreaded type, a gradual lag of the replica begins. All tables have PK. 

Additionally, I want to note that in the multi-threaded replication code, for each rpl_rca_wkr, the thread number is always zero, which makes debugging between threads somewhat difficult. I attach flamegraphs for analysis.
[22 Nov 2022 10:17] Max Irgiznov
Single thread replica

Attachment: flamegraph_mth.svg (image/svg+xml, text), 954.56 KiB.

[22 Nov 2022 10:18] Max Irgiznov
Single thread replica

Attachment: flamegraph_single.svg (image/svg+xml, text), 1.10 MiB.

[22 Nov 2022 10:20] Max Irgiznov
pt-pmp profilig on mth replica

Attachment: pt-pmp.svg (image/svg+xml, text), 105.96 KiB.

[22 Nov 2022 10:25] MySQL Verification Team
Hi Max,

What version of MySQL Server are you using?
[22 Nov 2022 11:48] MySQL Verification Team
Hi Max,

Additionally, this "no index" situation I can reproduce but "all tables have PK" situation I cannot reproduce. I see your measurements but I need a way to reproduce the problem, assuming you can reproduce this with 8.0.31

Thanks
[22 Nov 2022 16:07] Max Irgiznov
Now we use 8.0.30. Upgrade to 31 is not possible now
[22 Nov 2022 17:59] MySQL Verification Team
Hi Max,

I do not think there is anything between .30 and .31 wrt this so .30 is ok, but I cannot reproduce this. Can you provide a reproducible test case

kind regards
[23 Nov 2022 11:00] Max Irgiznov
I tried to reproduce using
https://github.com/Percona-Lab/sysbench-tpcc

hw: 28 vcpu, 167G ram, nvme raid10, source on host, replica in docker.
0. change mysql vars in contaner: set global sync_binlog=1024;  set global innodb_buffer_pool_size=1342177280*4; set global innodb_flush_log_at_trx_commit=2; set global innodb_io_capacity_max = 400000; set global innodb_io_capacity = 200000; set global innodb_io_capacity_max = 400000; set global innodb_io_capacity = 200000;

1. ./tpcc.lua --mysql-socket=/var/run/mysqld/mysqld.sock --mysql-user=test --mysql-password=еуые --mysql-db=test --time=300 --threads=4 --report-interval=1 --tables=1 --scale=10 --db-driver=mysql prepare

2. setup replica from host to container, do: stop replica; set global replica_parallel_workers=0; start replica;

3. the next step is to determine the number of threads with which the replica will not lag behind in single-threaded mode on your hardware, in my case it is 5 threads.
./tpcc.lua --mysql-socket=/var/run/mysqld/mysqld.sock --mysql-user=test --mysql-password=tezt --mysql-db=test --time=600 --threads=5 --report-interval=1 --tables=1 --scale=10 --db-driver=mysql run

run the script and wait for a minute and make sure that the replica is not more than 1 second behind the source.

4. change threads num: stop replica; set global replica_parallel_workers=8; start replica;

now wait a minute and see that replication is starting to lag a little, there will be a picture in flamegraph as I showed above.

If return the value replica_parallel_workers=0, then the replica will start catching up with a source
[24 Nov 2022 12:18] MySQL Verification Team
Hi Max, Chengqing,

I discussed this with replication team as I'm having issues reproducing this. Even the original report with key-less table should not happen.

The flame graphs are interesting, but would be useful to get a bit more details, and do it slightly more controlled:
- If possible, use just START REPLICA IO_THREAD while the workload is executing on the source. Once everything is received on the replica, run START REPLICA SQL_THREAD, and measure the time to catch up.
- Run the same test in four cases: (1) multi-threaded with index, (2) multi-threaded without index, (3) single-threaded with index, (4) single-threaded without index.
So the result of this would be four flame graphs and four measured time durations.

That may not solve the problem, but give some idea about the area of the code that is to blame, which may direct future testing.

Thanks
[25 Nov 2022 9:59] Max Irgiznov
Hello! I ran the recommended tests.

Attachment: flamegraphs.zip (application/x-zip-compressed, text), 367.58 KiB.

[23 Dec 2022 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[27 Dec 2022 9:40] Max Irgiznov
Hi Team.

Do you have any updates?
[27 Dec 2022 9:46] MySQL Verification Team
Hi,

No updates yet, the bug is now with replication team.

Thanks
[17 Mar 2023 13:11] Slawomir Maludzinski
Posted by developer:
 
The replication team analyzed attached flame graphs and also generated some new ones. As stated in the beginning of the discussion thread it is not recommended to remove primary keys or indexes on replica. In such case transactions which are executed on source and block some rows while re-executed on replica may lock other sets of rows. This, in turn, may lead to transactions being rolled back on replica and retired. In the long run leading to performance degradation, which is observed in the bug report. Right now it is limitation of the replication process which will be explained in documentation more clearly.
[17 Mar 2023 13:12] Slawomir Maludzinski
Posted by developer:
 
The replication team analyzed attached flame graphs and also generated some new ones. As stated in the beginning of the discussion thread it is not recommended to remove primary keys or indexes on replica. In such case transactions which are executed on source and block some rows while re-executed on replica may lock other sets of rows. This, in turn, may lead to transactions being rolled back on replica and retired. In the long run leading to performance degradation, which is observed in the bug report. Right now it is limitation of the replication process which will be explained in documentation more clearly.
[17 Mar 2023 13:12] Slawomir Maludzinski
Posted by developer:
 
The replication team analyzed attached flame graphs and also generated some new ones. As stated in the beginning of the discussion thread it is not recommended to remove primary keys or indexes on replica. In such case transactions which are executed on source and block some rows while re-executed on replica may lock other sets of rows. This, in turn, may lead to transactions being rolled back on replica and retired. In the long run leading to performance degradation, which is observed in the bug report. Right now it is limitation of the replication process which will be explained in documentation more clearly.
[17 Mar 2023 13:12] Slawomir Maludzinski
Posted by developer:
 
The replication team analyzed attached flame graphs and also generated some new ones. As stated in the beginning of the discussion thread it is not recommended to remove primary keys or indexes on replica. In such case transactions which are executed on source and block some rows while re-executed on replica may lock other sets of rows. This, in turn, may lead to transactions being rolled back on replica and retired. In the long run leading to performance degradation, which is observed in the bug report. Right now it is limitation of the replication process which will be explained in documentation more clearly.
[17 Mar 2023 14:03] Sven Sandberg
Posted by developer:
 
Adding to Sławomir's analysis:

Limitation: Tables without unique keys always have a negative impact on replica performance. It may be even worse when using parallel applier with replica-preserve-commit-order=ON.

Reason: When there is a unique key, the replica applier will use that to find rows to update or delete. Since the key is unique, it will directly locate the correct row. Then there is no problem. But if there are only non-unique keys, there may be multiple rows for the same index entry. The replica then has to iterate over several candidate rows and compare the full row, in order to determine which row to update or delete. While doing so, in order to preserve isolation level requirements, it has to put read locks on rows it has searched. These read locks may conflict with other transactions, even if the actual updates of those other transactions are non-conflicting. This may lead to contention on row locks between workers scheduled in parallel. It is then possible that a transaction trx1 occurring earlier in the stream has to wait for a transaction trx2 occurring later in the stream. Due to replica-preserve-commit-order, trx2 will also wait for trx1 as soon as trx2 reaches commit. Therefore this leads to a deadlock. The server will detect the deadlock, rollback trx2, and retry trx2. This happens only when trx2 has executed up to the commit. So the entire transaction trx2 may execute more than once. While executing, it may block trx1. This can have a high CPU cost.
[14 Apr 2023 9:05] Max Irgiznov
Hello team.

Good comment from the developer.

I wanted to remind you once again that in the multi-threaded replication code for each thread with the name rpl_rca_wkr, have thread number is always zero, which makes it somewhat difficult to debug between threads. It would be great to fix it all together.
[16 Oct 2023 22:52] Jon Stephens
Fixed in all versions of the Manual 8.0+ in mysqldoc rev 76975.

The changes will appear eventually at

https://dev.mysql.com/doc/refman/8.0/en/replication-options-replica.html#sysvar_replica_pa...

Thanks!

j..