Description:
When trying to manually rejoin an expelled instance back to the group replication cluster, we get this error:
SystemError: RuntimeError: Cluster.rejoin_instance: The instance 'db-1001:3306' does not belong to the ReplicaSet: 'default'
We can, however, rejoin the expelled member by:
STOP GROUP_REPLICATION;
START GROUP_REPLICATION;
How to repeat:
We have a group replication cluster of 5 nodes:
root@db-1001 [(none)]> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 238fea26-ff16-11e9-b2ea-d06726c86300 | db-2001 | 3306 | ONLINE | SECONDARY | 8.0.18 |
| group_replication_applier | 71cba15b-ff15-11e9-9047-8030e005f300 | db-1002 | 3306 | ONLINE | PRIMARY | 8.0.18 |
| group_replication_applier | 92fc2903-ff14-11e9-86a4-8030e006a5a0 | db-1001 | 3306 | ONLINE | SECONDARY | 8.0.18 |
| group_replication_applier | d7354d4f-ff16-11e9-9b25-8030e0060690 | db-6001 | 3306 | ONLINE | SECONDARY | 8.0.18 |
| group_replication_applier | edca9e98-ff16-11e9-afb4-d06726c87550 | db-2002 | 3306 | ONLINE | SECONDARY | 8.0.18 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
5 rows in set (0.00 sec)
Now we blocks connections on db-1001 to simulate a network glitch. Shortly after the network came back, it turns to ERROR state which is expected:
2019-11-14T13:20:29.478612Z 0 [ERROR] [MY-011505] [Repl] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing member status to ERROR.'
2019-11-14T13:20:29.478729Z 0 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
Since we have autoRejoin disabled, we need to manually rejoin the expelled member back via MySQL Shell:
MySQL Shell 8.0.18
Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.
Type '\help' or '\?' for help; '\quit' to exit.
WARNING: Using a password on the command line interface can be insecure.
Creating a session to 'gr_admin@db-1002'
Your MySQL connection id is 879636 (X protocol)
Server version: 8.0.18 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.
MySQL db-1002:33060+ ssl Py > cluster = dba.get_cluster()
MySQL db-1002:33060+ ssl Py > cluster.describe()
{
"clusterName": "gr__bk_eu__test__3",
"defaultReplicaSet": {
"name": "default",
"topology": [
{
"address": "db-1002:3306",
"label": "db-1002:3306",
"role": "HA"
},
{
"address": "db-2001:3306",
"label": "db-2001:3306",
"role": "HA"
},
{
"address": "db-2002:3306",
"label": "db-2002:3306",
"role": "HA"
},
{
"address": "db-6001:3306",
"label": "db-6001:3306",
"role": "HA"
}
],
"topologyMode": "Single-Primary"
}
}
MySQL db-1002:33060+ ssl Py > cluster.rejoin_instance("gr_admin@db-1001:3306")
Rejoining the instance to the InnoDB cluster. Depending on the original
problem that made the instance unavailable, the rejoin operation might not be
successful and further manual steps will be needed to fix the underlying
problem.
Please monitor the output of the rejoin operation and take necessary action if
the instance cannot rejoin.
Rejoining instance to the cluster ...
ERROR: Failed to erase the password: Unknown or unsupported command: erase
Please provide the password for 'gr_admin@db-1001:3306': *************************
Save password for 'gr_admin@db-1001:3306'? [Y]es/[N]o/Ne[v]er (default No):
Traceback (most recent call last):
File "<string>", line 1, in <module>
SystemError: RuntimeError: Cluster.rejoin_instance: The instance 'db-1001:3306' does not belong to the ReplicaSet: 'default'.
Suggested fix:
It should be able to use Cluster.rejoinInstance() to rejoin the expelled instance back without restarting group replication.