MySQL Bugs: #114072: Group replication slow down existing asynchronous replication

Bug #114072	Group replication slow down existing asynchronous replication
Submitted:	20 Feb 2024 21:08	Modified:	8 Mar 2024 16:19
Reporter:	Evgeny Gelfand	Email Updates:
Status:	Unsupported	Impact on me:	None
Category:	MySQL Server: Group Replication	Severity:	S3 (Non-critical)
Version:	8.0.35-27	OS:	Linux
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	group replicaiton, performace, replication

Description:
Existing asynchronous replication slow down to almost stopping once server added to group replication cluster.

How to repeat:
setup:
server sA (source of asynchronous replication)
server GR1 (PRIMARY member of group replication cluster)
servers GR2/GR3 (SECONDARY members of group replication cluster)
channel AtoGR1 (asynchronous replication channel)

1. setup group replication cluster GR1/GR2/GR3
2. set GR1 to be primary
3. setup on GR1 replication channel 'AtoGR1' using source sA.
4. start replica for channel 'AtoGR1';

Once asynchronous replica started it would accumulate lag and wait on:
from: show replica status for channel 'AtoGR1':
Replica_SQL_Running_State: Waiting for dependent transaction to commit

from: sys.diagnostics() output the worker thread waiting for
wait/synch/cond/group_rpl/COND_count_down_latch, plugin_utils.h:438

from: performance_schema.data_locks
we can see two rows for the worker thread:
*************************** 1. row ***************************
ENGINE: INNODB
ENGINE_LOCK_ID: 140461368945912:3517:140461392345976
ENGINE_TRANSACTION_ID: 16513483410
THREAD_ID: 867952
EVENT_ID: 7523072
OBJECT_SCHEMA: mysql
OBJECT_NAME: slave_worker_info
PARTITION_NAME: NULL
SUBPARTITION_NAME: NULL
INDEX_NAME: NULL
OBJECT_INSTANCE_BEGIN: 140461392345976
LOCK_TYPE: TABLE
LOCK_MODE: IX
LOCK_STATUS: GRANTED
LOCK_DATA: NULL
*************************** 2. row ***************************
ENGINE: INNODB
ENGINE_LOCK_ID: 140461368945912:4294967294:1876:2:140461392035192
ENGINE_TRANSACTION_ID: 16513483410
THREAD_ID: 867952
EVENT_ID: 7523072
OBJECT_SCHEMA: mysql
OBJECT_NAME: slave_worker_info
PARTITION_NAME: NULL
SUBPARTITION_NAME: NULL
INDEX_NAME: PRIMARY
OBJECT_INSTANCE_BEGIN: 140461392035192
LOCK_TYPE: RECORD
LOCK_MODE: X,REC_NOT_GAP
LOCK_STATUS: GRANTED
LOCK_DATA: 'AtoGR1', 1

Once the SECONDARY members issues: stop group_replicaiton the asynchronous replica starting move and proceed transactions. The performance_schema.data_locks not showing any locks any more.

Group Replication is not affected in any stage of the process.
I.e. when asynchronous replication accumulate lag the transactions that originated from GR1 keep proceed.

Thanks

The lag starting to accumulate once network latency increase. However the latency not affect the replication when no secondary members are up.
So, to reproduce you may need to add some latency to the network between group replication members, perhaps RTT of 32 ms should be enough.

I've uploaded output of ps_trace_thread for the replication worker thread.
Ran with following :
call ps_trace_thread(<thread>,'<output file>',NULL,NULL,TRUE,TRUE,TRUE); 

grep COND_count_down_latch 23225.dot produce 1359 lines, an average of 38.30ms wait, i.e. 52 seconds in the trace the thread spent on the wait for COND_count_down_latch.

Actually I was wrong.
The replication worker thread spent 58.8257 second on the COND_count_down_latch.

Thank you very much for your time and efforts

Hello,
I've set up the following configuration:

GALERA 5.7 --async--> MySQL 8.0 (anonymous GTID) --async--> MySQL 8.0 Primary Group Replication with 4 nodes (including two servers located far away, FS1 and FS2).

FS1 has a Round Trip Time (RTT) of 38.2ms, while FS2 has an RTT of 62.6ms.
The RTT I measured using mtr on the server that acts as primary for GR and running the replica from MySQL 8.0 (anonymous GTID).

The lag immediately occurs in the MySQL 8.0 (anonymous GTID) --async--> MySQL 8.0 Primary Group Replication part, but it doesn't seem to depend on the RTT. The lag accumulates when FS1 is added, but not FS2.

Although the lag is more likely to occur when running multi-master, this isn't always the case. There seems to be another reason for the lag, possibly related to the network, but not just the latency. Additionally, updating slave_worker_info with master log information may be somehow related.

Does MySQL need a short metadata lock before updating the table (as is the case in Sybase, for example), causing a bottleneck? Or is the table so hot that the memory page containing the table's page becomes a bottleneck? (This table isn't replicated as part of Group Replication; it's local to each host.)

Here's an example of statistics for the 8.0 -> async -> Group Replication from mysqld.log (of the MySQL 8.0 anonymous GTID):

Multi-threaded replica statistics for channel '57_to_8':
Seconds elapsed = 120
Events assigned = 3512321
Worker queues filled over overrun level = 0
Waited due to a Worker queue full = 0
Waited due to the total size = 0
Waited at clock conflicts = 635826316600 waited (count) when Workers occupied = 0 waited when Workers occupied = 0

I'm running with 10 replica parallel workers.

Thanks.

Hi,

I spent few days on this but I am unable to reproduce this.

I am testing 8.0.36 only as 5.7 had its EOL last year. 

Server -N1-> (GR -N2-> GR+GR) configuration and I was inserting delays, garbage, noise into both N1 and N2 network and I was not able to reproduce nothing close to what you are describing.

I see you are testing GALERA (not our product), are those 8.0 servers in GR our binaries or you are using some forks? Can you reproduce the problem using mysql 8.0.36 binaries made by Oracle?

Thanks

If you managed to reproduce this with 8.0.36 I would appreciate
- full config of both the external master and the GR setup
- how exactly did you introduce delay and by how much
- what type of traffic you tested it with (I assume some sysbench is used, this is what I tested it with)

Thanks

I found something that may support latency version.
I've set binlog_transaction_dependency_tracking to be WRITESET on 8.0.35 (assign anonymous GTID to 5.7 channel) and increased replica_parallel_workers to be 256.

Outcome is as following:
1. once all "far away" servers stopping the group_replication the lag disappear immediate (despite all primary or one primary and the rest is secondary).
I see over 100 replica threads actively working in parallel. The waits mostly: "Waiting for preceding transaction to commit"

2. Once  I am starting the "far away" the lag starting to accumulate but slowly.
I see that much less replica threads actively working in parallel. The waits mostly: "waiting for handler commit".

The applier threads almost all the time is idle.
There is example of statistics on one of the GR nodes:
Multi-threaded replica statistics for channel 'group_replication_applier': seconds elapsed = 121; events assigned = 67585; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 160508000 waited (count) when Workers occupied = 96 waited when Workers occupied = 6357000

And there is an example of statistics on the "main" GR node:
Before "far away" server joins:
 Multi-threaded replica statistics for channel 's57': seconds elapsed = 121; events assigned = 3826689; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 3023924959100 waited (count) when Workers occupied = 628 waited when Workers occupied = 2051035200
After:
Multi-threaded replica statistics for channel 's57': seconds elapsed = 120; events assigned = 5338113; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 3419144531800 waited (count) when Workers occupied = 793 waited when Workers occupied = 2388235100

You can see that the statistics almost identical.

What would prevent the replica workers from work in parallel? Latency? then I would think I should see more "busy" threads and not more "idle".

Thanks

These locks are there mostly for DDL, for DMLs these are shared locks so there should be no bottlenecks. 

I cannot reproduce this behavior on my system but some changes in 8.0.37 should make this issue go away for you if you are able to test it. Please test and get back with results.

Hello,
I will try to manage to use MySQL version 8.0.37. 
Meanwhile, I've checked the process on version 8.0.36-28. T
he waits have changed:

According to the trace, one thread executed 792 transactions (GTID: XXXX) (oer 10 trx per second, good).
The most significant waits:
sql/WaitingforaneventfromCoordinator: 7.2795 sec
sql/Worker_info::jobs_cond-wait in rpl_rli_pdb.cc:2367: 7.5108 sec
sql/waitingforhandlercommit: 40.3549 sec
group_rpl/COND_count_down_latch-wait in plugin_utils.h:438: 45.6882 sec

I believe that the wait for "COND_count_down_latch" includes the wait for "sql/waitingforhandlercommit". However, as we know, the wait is not bounded, but the work is. 
So maybe not. 
I am not sure how to improve the situation above.

Above waits is after making the following changes:

sync_source_info = 0
sync_relay_log = 0
sync_relay_log_info = 0
replica_parallel_workers = 80

And, Ii seems, that the growth of the lag has slowed down significantly.

I would be thankful if you could direct me on how to improve/tune it, as well as how I can monitor it more efficiently.

Best Regards
Evgeni

Hi,

We are not responsible for the changes MySQL fork's make so since we cannot reproduce this we consider this unsupported. If you can reproduce with Oracle binary 8.0.37 please let us know.

As for the tuning and monitoring, your best option is to contact our support team as that is not something that belong in the bugs system.

Thank you for using MySQL Server

Understood thanks.
Can you at least explain what it wait for: "COND_count_down_latch-wait"
I can't find any documentation about it.

Best Regards
Evgeni

Please tell me when you tried to reproduce what was the replica statistics?
In my case, according to statistics this about 4k tps.
It may make all the difference.

Thanks