Bug #104646 InnoDB group replication stalled on large transaction and high network bandwidth
Submitted: 17 Aug 2021 15:24 Modified: 30 Sep 2021 8:40
Reporter: Armin P Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:8.0.26 OS:Debian (10.10)
Assigned to: MySQL Verification Team CPU Architecture:x86

[17 Aug 2021 15:24] Armin P
Description:
When running a large transaction (single insert > 250MB) on an InnoDB group replication cluster with 3 nodes (virtual maschines on KVM) the replication to secondary nodes is stalled if hitting network bandwidth limits (here 1 Gbit/s).

The large transaction is pending infinite for completion, while primary and secondary nodes show consistent high network bandwidth.
It seems the secondary nodes are trying to replicate the problematic transaction over and over again.
Further transactions are waiting for the pending large transactions to complete.
The large transaction and high bandwidth usage can only be cancelled by stopping the primary node.

While the replication problem exists, the group replication plugin sometimes reports connection problems between the nodes in /var/log/mysql/error.log:

2021-08-17T10:04:07.271338Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address server-1:3306 has become unreachable.'
2021-08-17T10:04:09.627989Z 0 [Warning] [MY-011494] [Repl] Plugin group_replication reported: 'Member with address server-1:3306 is reachable again.'
2021-08-17T10:04:10.118516Z 0 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Shutting down an outgoing connection. This happens because something might be wrong on a bi-directional connection to node server-1:33061. Please check the connection status to this member'
2021-08-17T10:04:10.119153Z 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Failure reading from fd=-1 n=18446744073709551615'
2021-08-17T10:04:47.375698Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address server-1:3306 has become unreachable.'
2021-08-17T10:04:48.257415Z 0 [Warning] [MY-011494] [Repl] Plugin group_replication reported: 'Member with address server-1:3306 is reachable again.'
2021-08-17T10:04:55.623104Z 0 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Shutting down an outgoing connection. This happens because something might be wrong on a bi-directional connection to node server-1:33061. Please check the connection status to this member'

All nodes remain in the cluster as they are only unreachable for a short time (based on state from performance_schema.replication_group_members).

Following adjustments did not solve the problem:
* Increasing group_replication_set_write_concurrency to maximum
* Increasing group_replication_poll_spin_loops to maximum
* Disabling compression by group_replication_compression_threshold=0
* Reducing group_replication_communication_max_message_size to 1 MB
* Increasing group_replication_member_expel_timeout to 300 seconds

Connecting the mysql nodes with a 10 Gbit/s connection or running the virtual machines with mysql on the same KVM host solved the replication problem.
But adding software-based network bandwidth limits for the mysql nodes leads to the same problem, so I expect the problem could also occour on a 10 Gbit network if bandwidth problems exist there.

Based on https://dev.mysql.com/doc/refman/8.0/en/group-replication-limitations.html is there a requirement that a complete transaction has to be replicated within the noted 5-second window or does this only apply for single message fragments?
"If an individual transaction results in message contents which are large enough that the message cannot be copied between group members over the network within a 5-second window, members can be suspected of having failed, and then expelled, just because they are busy processing the transaction."

Is there an option to ensure stable replication in case of bandwidth limititation issues?

How to repeat:
Set up a new InnoDB group replication cluster with 2 or 3 nodes.

Create a database and table for testing:

$ mysql -uroot -proot -e 'create database grtest'
$ mysql -uroot -proot grtest -e 'create table test (id bigint auto_increment, data longblob, primary key (id))'

Generate a small transaction and verify inserted data:

$ mysql -uroot -proot grtest < <(echo "insert into test (data) values ('$(dd if=/dev/urandom iflag=fullblock bs=1M count=1 | base64 | tr -d '\n')')")
$ mysql -B -uroot -proot grtest -e 'select id, length(data) from test'
id      length(data)
1       1398104

Connect the mysql nodes with a 1 GBit connection, alternatively add software-based bandwidth limitations, examples:
* using OS tools on mysql nodes: tc qdisc add dev $nic root tbf rate 100mbit limit 10mb burst 20kb
* using virsh on a KVM host: domiftune $domain $domain_interface --live --outbound 10000,10000,10000

Create a large transaction of 400MB:

$ echo "insert into test (data) values ('$(dd if=/dev/urandom iflag=fullblock bs=1M count=300 | base64 | tr -d '\n')');" > sql
$ mysql -uroot -proot grtest < sql

Import will never complete unless bandwidth limitations/issues are removed.

Remove network bandwidth limitation and wait for pending transaction to complete or insert again:

$ echo "insert into test (data) values ('$(dd if=/dev/urandom iflag=fullblock bs=1M count=300 | base64 | tr -d '\n')');" > sql
$ mysql -uroot -proot grtest < sql

$ mysql -B -uroot -proot grtest -e 'select id, length(data) from test'
id      length(data)
1       1398104
2       419430400
[24 Aug 2021 4:49] MySQL Verification Team
Hi Armin,

I'm not able to reproduce this but I'll discuss this today with the GroupReplication team to see if we need more data from your and if there's something I'm not seeing here.
[31 Aug 2021 8:40] MySQL Verification Team
Hi Armin,

can you reproduce this without compression?

Specifically binlog_transaction_compression.

Thanks
[1 Oct 2021 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".