Description:
When setting up a new MGR (MySQL Group Replication) cluster using MySQL Shell with database version 8.0.43, consisting of three nodes: mysql1, mysql2, and mysql3.
When mysql1 is the primary node, executing iptables -A OUTPUT -d mysql2 -j DROP on the mysql1 server to block outgoing traffic from mysql1 to mysql2 will result in mysql1 being evicted from the cluster, leaving a new cluster composed of mysql2 and mysql3.
However, when executing iptables -A OUTPUT -d mysql3 -j DROP on the mysql1 server to block outgoing traffic from mysql1 to mysql3, the following anomalies occur:
From mysql1's perspective, its own status and mysql3's status are ONLINE, while mysql2's status is UNREACHABLE.
From mysql2's perspective, its own status and mysql3's status are ONLINE, while mysql1's status is UNREACHABLE.
From mysql3's perspective, all nodes are ONLINE.
Additionally, the error log of mysql1 continuously reports the following error:
[Repl] Plugin group_replication reported: 'Failed to establish MySQL client connection in Group Replication. Error establishing connection. Please refer to the manual to make sure that you configured Group Replication properly to work with MySQL Protocol connections.'
During testing, if the above situation does not occur, you can set mysql2 or mysql3 as the primary node. The test can be repeated for a maximum of 6 rounds, with the only variable being the execution of iptables -A OUTPUT -d xx -j DROP on the primary node. Therefore, this phenomenon does not necessarily occur only when mysql1 is the primary node.
In total, there are three possible outcomes:
The cluster remains in the UNREACHABLE state for an extended period without any node being expelled, and the master-slave data synchronization works normally at this time.
The primary node is expelled, triggering a master-slave switchover.
A slave node is expelled without triggering a switchover.
When a node is expelled, the error log will show Error pushing message into group communication engine., followed by a message indicating that the node is set to ERROR due to network issues: [Repl] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing member status to ERROR.'
The versions tested include 8.0.23, 8.0.28, 8.0.32, and 8.0.43.
How to repeat:
An MGR (MySQL Group Replication) instance already exists. It was set up using MySQL Shell with default parameters.
mysql> select MEMBER_ID,MEMBER_HOST,MEMBER_STATE,MEMBER_ROLE from performance_schema.replication_group_members;
+--------------------------------------+-------------+--------------+-------------+
| MEMBER_ID | MEMBER_HOST | MEMBER_STATE | MEMBER_ROLE |
+--------------------------------------+-------------+--------------+-------------+
| 409821a9-6789-11f0-a939-0050568ba84a | mysql2 | ONLINE | SECONDARY |
| 60515c68-6789-11f0-93ca-0050568b194e | mysql1 | ONLINE | PRIMARY |
| d89ade68-678b-11f0-94f9-000c29d5f729 | mysql3 | ONLINE | SECONDARY |
+--------------------------------------+-------------+--------------+-------------+
3 rows in set (0.00 sec)
Execute `iptables -A OUTPUT -d mysql3 -j DROP` on the primary node's server.
mysql1:
mysql> select MEMBER_ID,MEMBER_HOST,MEMBER_STATE,MEMBER_ROLE from performance_schema.replication_group_members;
+--------------------------------------+-------------+--------------+-------------+
| MEMBER_ID | MEMBER_HOST | MEMBER_STATE | MEMBER_ROLE |
+--------------------------------------+-------------+--------------+-------------+
| 409821a9-6789-11f0-a939-0050568ba84a | mysql2 | ONLINE | SECONDARY |
| 60515c68-6789-11f0-93ca-0050568b194e | mysql1 | ONLINE | PRIMARY |
| d89ade68-678b-11f0-94f9-000c29d5f729 | mysql3 | UNREACHABLE | SECONDARY |
+--------------------------------------+-------------+--------------+-------------+
3 rows in set (0.00 sec)
mysql2:
mysql> select MEMBER_ID,MEMBER_HOST,MEMBER_STATE,MEMBER_ROLE from performance_schema.replication_group_members;
+--------------------------------------+-------------+--------------+-------------+
| MEMBER_ID | MEMBER_HOST | MEMBER_STATE | MEMBER_ROLE |
+--------------------------------------+-------------+--------------+-------------+
| 409821a9-6789-11f0-a939-0050568ba84a | mysql2 | ONLINE | SECONDARY |
| 60515c68-6789-11f0-93ca-0050568b194e | mysql1 | ONLINE | PRIMARY |
| d89ade68-678b-11f0-94f9-000c29d5f729 | mysql3 | ONLINE | SECONDARY |
+--------------------------------------+-------------+--------------+-------------+
3 rows in set (0.00 sec)
mysql3:
mysql> select MEMBER_ID,MEMBER_HOST,MEMBER_STATE,MEMBER_ROLE from performance_schema.replication_group_members;
+--------------------------------------+-------------+--------------+-------------+
| MEMBER_ID | MEMBER_HOST | MEMBER_STATE | MEMBER_ROLE |
+--------------------------------------+-------------+--------------+-------------+
| 409821a9-6789-11f0-a939-0050568ba84a | mysql2 | ONLINE | SECONDARY |
| 60515c68-6789-11f0-93ca-0050568b194e | mysql1 | UNREACHABLE | PRIMARY |
| d89ade68-678b-11f0-94f9-000c29d5f729 | mysql3 | ONLINE | SECONDARY |
+--------------------------------------+-------------+--------------+-------------+
3 rows in set (0.00 sec)
The other two scenarios can be stably reproduced by isolating different slave nodes under different primary nodes.
Suggested fix:
In this scenario test, network failures from the primary node to any slave node are simulated using iptables. For example, with nodes A, B, and C: if A isolates C (blocks traffic to C), two potential majorities (AB and BC) would theoretically emerge. Under normal circumstances, the system should determine whether A or C is faulty, leading to the formation of a valid majority such as AB or BC — which is the expected behavior.
However, there may be anomalies that prevent the system from identifying whether A or C is faulty. In such cases, no nodes will be expelled from the cluster.