| Bug #99133 | Group Replication Performance Degradation with partial network outage | ||
|---|---|---|---|
| Submitted: | 31 Mar 2020 15:08 | Modified: | 7 Apr 2020 13:38 |
| Reporter: | Tibor Korocz | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Group Replication | Severity: | S3 (Non-critical) |
| Version: | 8.0.19 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[7 Apr 2020 13:38]
MySQL Verification Team
Hi Tibor, Thanks for the report. I can reproduce this so I am verifying the report. I'm trying to figure out if I'd agree that this is a bug or not and I tend to agree with you that this is a bug. I'm verifying this but we'll see what our GR team will have to say about this. Again, thanks for reporting this and providing excellent test case good health Bogdan
[30 Apr 2020 14:06]
Boris R
Any update on this? Does this also effect 8.0.20? This is a serious issue.
[11 Jun 2020 11:45]
MySQL Verification Team
Bug #99830 is marked as duplicate of this one.
[31 Aug 2022 17:45]
Matthew Boehm
Updates? This is still a HUGE issue in 8.0.28 when the network partitions.
[15 Sep 2022 16:29]
Kenny Gryp
Please try this scenario with group_replication_paxos_single_leader=ON (https://dev.mysql.com/doc/refman/8.0/en/group-replication-single-consensus-leader.html) (The cluster.status() output is no longer reporting inconsistent status depending on where you poll)
[30 Jan 2024 17:45]
Matthew Boehm
MySQL 8.0.35 - group_replication_paxos_single_leader=ON does not help. The above issue is still observed. node1 sees all nodes online; node2 only sees node1; node3 only sees node2. Quorum cannot be agreed upon. Nodes are not evicted; txn are not certified on partially blocked node.
[30 Jan 2024 17:50]
Matthew Boehm
group_replication_unreachable_majority_timeout also does not help because current primary still sees a majority.

Description: Hi, I have a three node InnoDB Cluster. mysql1,mysql2,mysql3 mysql2 is the Primary , mysql1 and mysql3 are the readers. If we simulate a partial network outage example with iptables: Running this on mysql3: mysql3# iptables -A INPUT -s mysql2 \ -j DROP; iptables -A OUTPUT -s mysql2 -j DROP mysql3 will still get all the changes made on mysql2 because mysql1 is going act like a relay node and send all the changes to mysql3. You can confirm this even with tcpdump. However it has a huge Performance impact. Before I cut the network I was able to insert 60-80 rows per second after that only 1-3 roes per second, which is a huge degradation. Also the cluster.status() on mysql2 reports that mysql3 is not reachable , but mysql2 reports that all the nodes are Online, which is also interesting in a cluster I would love to see if all the nodes are reporting the same cluster status, except if a node is totally isolated. How to repeat: How to repeat: create a 3 node InnoDb cluster. create a table on primary: CREATE TABLE `lab` ( `id` int NOT NULL AUTO_INCREMENT, `hostname` varchar(20) DEFAULT NULL, `created_at` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `idx_created` (`created_at`) ) ENGINE=InnoDB; Insert some data in a loop on mysql2: while true;do mysql -usbtest -pxxxxx -P3306 -h127.0.0.1 -e "INSERT INTO sysbench.lab (hostname) VALUES ( @@hostname)"; done 2>/dev/null On mysql2 also start another loop to select roughly how many rows are inserted per second: while true;do mysql -BN -usbtest --pxxxxx-P3306 -hmysql2 -e "select 'mysql2',count(*),now() from sysbench.lab where created_at BETWEEN now() - INTERVAL 1 second AND now()"; sleep 1; done 2>/dev/null Cut network between a reader and primary: mysql3# iptables -A INPUT -s mysql2 \ -j DROP; iptables -A OUTPUT -s mysql2 -j DROP You will see the impact immediately: mysql2 48 2020-03-31 12:27:15 mysql2 50 2020-03-31 12:27:16 mysql2 51 2020-03-31 12:27:17 mysql2 51 2020-03-31 12:27:18 mysql2 52 2020-03-31 12:27:19 mysql2 53 2020-03-31 12:27:20 mysql2 54 2020-03-31 12:27:21 mysql2 55 2020-03-31 12:27:22 mysql2 56 2020-03-31 12:27:23 mysql2 56 2020-03-31 12:27:24 mysql2 26 2020-03-31 12:27:25 mysql2 8 2020-03-31 12:27:26 mysql2 7 2020-03-31 12:27:27 mysql2 8 2020-03-31 12:27:28 mysql2 4 2020-03-31 12:27:29 mysql2 2 2020-03-31 12:27:30 mysql2 2 2020-03-31 12:27:31 mysql2 2 2020-03-31 12:27:32 mysql2 2 2020-03-31 12:27:33 Suggested fix: I am not sure what is causing this degradation but a partial network failure should not impact the performance that badly.