Description:
Given the below
------------------------
(root@localhost) [(none)]>SHOW VARIABLES LIKE 'group_replication_single_primary_mode';
+---------------------------------------+-------+
| Variable_name | Value |
+---------------------------------------+-------+
| group_replication_single_primary_mode | ON |
+---------------------------------------+-------+
(root@localhost) [(none)]>select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 9ee52ca9-81c3-11f0-95a1-08002757f75b | gr2n | 3306 | ONLINE | PRIMARY | 8.4.6 | XCom |
| group_replication_applier | a11b7ea6-81be-11f0-a64c-08002757f75b | gr3n | 3306 | ONLINE | PRIMARY | 8.4.6 | XCom |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
2 rows in set (0.00 sec)
Of course trying to promote one of the two nodes as single primary is not working:
(root@localhost) [(none)]>SELECT group_replication_set_as_primary('a11b7ea6-81be-11f0-a64c-08002757f75b');
+--------------------------------------------------------------------------+
| group_replication_set_as_primary('a11b7ea6-81be-11f0-a64c-08002757f75b') |
+--------------------------------------------------------------------------+
| The requested member is already the current group primary. |
+--------------------------------------------------------------------------+
How I end up in this situation?
1) start a new cluster say on node gr3n
(root@localhost) [(none)]>select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | a11b7ea6-81be-11f0-a64c-08002757f75b | gr3n | 3306 | ONLINE | PRIMARY | 8.4.6 | XCom |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
2) then add seconda node gr2n
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 9ee52ca9-81c3-11f0-95a1-08002757f75b | gr2n | 3306 | ONLINE | SECONDARY | 8.4.6 | XCom |
| group_replication_applier | a11b7ea6-81be-11f0-a64c-08002757f75b | gr3n | 3306 | ONLINE | PRIMARY | 8.4.6 | XCom |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
Now let us assume I am recovering from a backup (like a snapshot) and I am adding the node gr1n, and (here is the issue) I do not remove the data/auto.cnf file.
3) I start the new node:
(root@localhost) [(none)]>select @@server_uuid;
+--------------------------------------+
| @@server_uuid |
+--------------------------------------+
| a11b7ea6-81be-11f0-a64c-08002757f75b |
+--------------------------------------+
Gr1n and gr3n have the same UUID. If I start group replication on gr1n I will have an error for gr1n (which is correct):
RROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log
But I will also have the two existing nodes, both becoming primary:
(root@localhost) [(none)]>select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 9ee52ca9-81c3-11f0-95a1-08002757f75b | gr2n | 3306 | ONLINE | PRIMARY | 8.4.6 | XCom |
| group_replication_applier | a11b7ea6-81be-11f0-a64c-08002757f75b | gr3n | 3306 | ONLINE | PRIMARY | 8.4.6 | XCom |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
How to repeat:
In gr3n there will be no error:
2025-08-26T12:16:36.167677Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to gr2n:3306, gr3n:3306 on view 17561322338507168:16.'
2025-08-26T12:16:36.167845Z 12 [System] [MY-015046] [Repl] Plugin group_replication reported: 'This member gr3n:3306 will be the one sending the recovery metadata message.'
2025-08-26T12:16:36.421449Z 0 [System] [MY-011492] [Repl] Plugin group_replication reported: 'The member with address gr2n:3306 was declared online within the replication group.'
2025-08-26T12:20:47.107363Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to gr2n:3306, gr3n:3306 on view 17561322338507168:17.'
2025-08-26T12:20:47.107524Z 12 [System] [MY-015045] [Repl] Plugin group_replication reported: 'The member gr2n:3306 will be the one sending the recovery metadata message.'
2025-08-26T12:20:48.255760Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: '
2025-08-26T12:20:48.256037Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to gr2n:3306, gr3n:3306 on view 17561322338507168:18.'
While gr2n will report:
2025-08-26T12:16:33.552046Z 60 [System] [MY-013587] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' is starting.'
2025-08-26T12:16:33.553985Z 60 [System] [MY-011565] [Repl] Plugin group_replication reported: 'Setting super_read_only=ON.'
2025-08-26T12:16:33.559943Z 62 [System] [MY-010597] [Repl] 'CHANGE REPLICATION SOURCE TO FOR CHANNEL 'group_replication_applier' executed'. Previous state source_host='<NULL>', source_port= 0, source_log_file='', source_log_pos= 4, source_bind=''. New state source_host='<NULL>', source_port= 0, source_log_file='', source_log_pos= 4, source_bind=''.
2025-08-26T12:16:33.642826Z 63 [System] [MY-014081] [Repl] Plugin group_replication reported: 'The Group Replication certifier broadcast thread (THD_certifier_broadcast) started.'
2025-08-26T12:16:34.149807Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node 192.168.4.81:33061 when joining a group. My local port is: 33061.'
2025-08-26T12:16:36.168677Z 60 [System] [MY-011511] [Repl] Plugin group_replication reported: 'This server is working as secondary member with primary member address gr3n:3306.'
2025-08-26T12:16:36.168716Z 0 [System] [MY-011565] [Repl] Plugin group_replication reported: 'Setting super_read_only=ON.'
2025-08-26T12:16:36.168976Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Incremental recovery from a group donor'
2025-08-26T12:16:36.169893Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to gr2n:3306, gr3n:3306 on view 17561322338507168:16.'
2025-08-26T12:16:36.268364Z 74 [System] [MY-010597] [Repl] 'CHANGE REPLICATION SOURCE TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state source_host='<NULL>', source_port= 0, source_log_file='', source_log_pos= 4, source_bind=''. New state source_host='gr3n', source_port= 3306, source_log_file='', source_log_pos= 4, source_bind=''.
2025-08-26T12:16:36.366376Z 74 [System] [MY-010597] [Repl] 'CHANGE REPLICATION SOURCE TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state source_host='gr3n', source_port= 3306, source_log_file='', source_log_pos= 4, source_bind=''. New state source_host='<NULL>', source_port= 0, source_log_file='', source_log_pos= 4, source_bind=''.
2025-08-26T12:16:36.420260Z 0 [System] [MY-011490] [Repl] Plugin group_replication reported: 'This server was declared online within the replication group.'
2025-08-26T12:16:37.171799Z 60 [System] [MY-014010] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' has been started.'
2025-08-26T12:20:47.106794Z 62 [System] [MY-015046] [Repl] Plugin group_replication reported: 'This member gr2n:3306 will be the one sending the recovery metadata message.'
2025-08-26T12:20:47.107071Z 0 [System] [MY-011507] [Repl] Plugin group_replication reported: 'A new primary with address gr2n:3306 was elected. The new primary will execute all previous group transactions before allowing writes.'
2025-08-26T12:20:47.107949Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to gr2n:3306, gr1n:3306 on view 17561322338507168:17.'
2025-08-26T12:20:47.384494Z 78 [System] [MY-011565] [Repl] Plugin group_replication reported: 'Setting super_read_only=ON.'
2025-08-26T12:20:47.385461Z 70 [System] [MY-013731] [Repl] Plugin group_replication reported: 'The member action "mysql_disable_super_read_only_if_primary" for event "AFTER_PRIMARY_ELECTION" with priority "1" will be run.'
2025-08-26T12:20:47.385696Z 70 [System] [MY-011566] [Repl] Plugin group_replication reported: 'Setting super_read_only=OFF.'
2025-08-26T12:20:47.386028Z 70 [System] [MY-013731] [Repl] Plugin group_replication reported: 'The member action "mysql_start_failover_channels_if_primary" for event "AFTER_PRIMARY_ELECTION" with priority "10" will be run.'
2025-08-26T12:20:47.386923Z 78 [System] [MY-013820] [Repl] Plugin group_replication reported: 'The member gr2n:3306, with UUID: 9ee52ca9-81c3-11f0-95a1-08002757f75b, was set as the single preferred consensus leader.'
2025-08-26T12:20:47.389200Z 78 [System] [MY-011510] [Repl] Plugin group_replication reported: 'This server is working as primary member.'
2025-08-26T12:20:47.893099Z 0 [Warning] [MY-014069] [Repl] Plugin group_replication reported: 'Member identified by the Gcs_member_identifier: '192.168.4.83:33061' does not exist on Group Replication membership during REACHABLE/UNREACHABLE notification from group communication engine.'
2025-08-26T12:20:47.893275Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address gr1n:3306 has become unreachable.'
2025-08-26T12:20:48.256090Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: gr1n:3306'
2025-08-26T12:20:48.256307Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to gr2n:3306, gr3n:3306 on view 17561322338507168:18.'
where we can see that the node first join as secondary, then when the gr1n node try to join the group replication with wrong ID, the gr2n node become Primary because it think:
```'Member identified by the Gcs_member_identifier: '192.168.4.83:33061' does not exist on Group Replication membership during REACHABLE/UNREACHABLE notification from group communication engine.'```
While gr3n is there and fine is g1n with the wrong UUID that failed.
to recover just STOP/START group_replication will do:
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
| group_replication_applier | 9ee52ca9-81c3-11f0-95a1-08002757f75b | gr2n | 3306 | ONLINE | SECONDARY | 8.4.6 | XCom |
| group_replication_applier | a11b7ea6-81be-11f0-a64c-08002757f75b | gr3n | 3306 | ONLINE | PRIMARY | 8.4.6 | XCom |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+
Suggested fix:
What should happen is that the nodes should be able to manage the case of having a node joining with the wrong UUID, without ending up in a multy primary state.