Bug #90485 Ignore group_replication_group_seeds nodes if they are not primary/active
Submitted: 18 Apr 2018 2:35 Modified: 22 May 2018 18:15
Reporter: Kenny Gryp Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.4 OS:Any
Assigned to: CPU Architecture:Any

[18 Apr 2018 2:35] Kenny Gryp
Description:
During certain specific network partitions, the contents of group_replication_group_seeds becomes very important and the configuration might need to be adjusted to specific network partitions.

How to repeat:
1. Setup 3 node cluster
2. network partition node1 from node2 and node3, make sure it's removed from node2 and node3
3. stop group replication on node 3
4. resume connectivity between node1 and node3, but not node2!!!
5. ensure you have the following state:

node1:

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 08c8108a-41b6-11e8-a423-08002789cd2e | node1       |        3306 | ONLINE       | PRIMARY     | 8.0.4          |
| group_replication_applier | 0bbd72b3-41b6-11e8-b262-08002789cd2e | node2       |        3306 | UNREACHABLE  | PRIMARY     | 8.0.4          |
| group_replication_applier | 0e998065-41b6-11e8-ae83-08002789cd2e | node3       |        3306 | UNREACHABLE  | PRIMARY     | 8.0.4          |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+

node2:

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 0bbd72b3-41b6-11e8-b262-08002789cd2e | node2       |        3306 | ONLINE       | PRIMARY     | 8.0.4          |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
1 row in set (0.00 sec)

node3:

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 0e998065-41b6-11e8-ae83-08002789cd2e | node3       |        3306 | OFFLINE      |             |                |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
1 row in set (0.00 sec)

mysql> show global variables like 'group_replication_group_seeds';
+-----------------------------------------------------+---------------------------------------+
| Variable_name                                       | Value                                 |
+-----------------------------------------------------+---------------------------------------+
| group_replication_group_seeds                       | 192.168.58.2:33061,192.168.58.3:33061 |
+-----------------------------------------------------+---------------------------------------+

6. start group_replication on node3

the query will hang and eventually error out:
ERROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log.

No nodes will come online... node2 will remain alone!

Suggested fix:

In order to make it work now it requires good understanding on how the network is partitioned and the group_replication_group_seeds will have to be adjusted to work around any potential issues. Under pressure and with many different possible network partitioning scenarios it becomes very difficult to get the cluster back up and running.

When node3 connects to node1 first, it should figure out it's not 'primary/active' and disconnect and then connect to the next seed, in this case node2 to figure out if node2 is 'primary/active'.

In that case, node2 and node3 would form a cluster again.
[22 May 2018 18:15] MySQL Verification Team
Hi Kenny,

Thanks for the report, I can verify the behavior. As for the solution, I'll let group replication team check it out :)

kind regards
Bogdan