Bug #99689 member cannot add to group_replication cluster after failover
Submitted: 26 May 4:02 Modified: 3 Jun 2:17
Reporter: phoenix Zhang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.18 OS:Any
Assigned to: CPU Architecture:Any
Tags: group_replication

[26 May 4:02] phoenix Zhang
Description:
A 3 nodes group_replication, if 2 nodes shutdown, when restart, it cannot rejoin to the cluster.

How to repeat:
First, build the group_replication.

1. connect 13000:
mysql> CHANGE MASTER TO MASTER_USER='root', MASTER_PASSWORD='' FOR CHANNEL 'group_replication_recovery';
Query OK, 0 rows affected, 1 warning (0.05 sec)

mysql> RESET MASTER;
Query OK, 0 rows affected (0.05 sec)

mysql> SET GLOBAL group_replication_bootstrap_group=on;
Query OK, 0 rows affected (0.00 sec)

mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (3.12 sec)

mysql> SET GLOBAL group_replication_bootstrap_group=off;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 33893ad2-5c1c-11ea-a6e3-ec5c6826bca3 | 127.0.0.1   |       13000 | ONLINE       | PRIMARY     | 8.0.18         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
1 row in set (0.01 sec)

2. connect 13001:

mysql> CHANGE MASTER TO MASTER_USER='root', MASTER_PASSWORD='' FOR CHANNEL 'group_replication_recovery';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> RESET MASTER;
Query OK, 0 rows affected (0.02 sec)

mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (3.86 sec)

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 33893ad2-5c1c-11ea-a6e3-ec5c6826bca3 | 127.0.0.1   |       13000 | ONLINE       | PRIMARY     | 8.0.18         |
| group_replication_applier | f24690f4-5c1c-11ea-837b-ec5c6826bca3 | 127.0.0.1   |       13001 | ONLINE       | SECONDARY   | 8.0.18         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.01 sec)

3. connect 13002:

mysql> CHANGE MASTER TO MASTER_USER='root', MASTER_PASSWORD='' FOR CHANNEL 'group_replication_recovery';
Query OK, 0 rows affected, 1 warning (0.06 sec)

mysql> RESET MASTER;
Query OK, 0 rows affected (0.01 sec)

mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (4.58 sec)

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 33893ad2-5c1c-11ea-a6e3-ec5c6826bca3 | 127.0.0.1   |       13000 | ONLINE       | PRIMARY     | 8.0.18         |
| group_replication_applier | 39bdb418-5c1d-11ea-b124-ec5c6826bca3 | 127.0.0.1   |       13002 | ONLINE       | SECONDARY   | 8.0.18         |
| group_replication_applier | f24690f4-5c1c-11ea-837b-ec5c6826bca3 | 127.0.0.1   |       13001 | ONLINE       | SECONDARY   | 8.0.18         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
3 rows in set (0.01 sec)

4. shutdown 13002, and connect 13000

mysql> select * from performance_schema.replication_group_members;                                                                                                                                          +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 33893ad2-5c1c-11ea-a6e3-ec5c6826bca3 | 127.0.0.1   |       13000 | ONLINE       | PRIMARY     | 8.0.18         |
| group_replication_applier | f24690f4-5c1c-11ea-837b-ec5c6826bca3 | 127.0.0.1   |       13001 | ONLINE       | SECONDARY   | 8.0.18         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)

5. shutdown 13001, and connect 13000:

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 33893ad2-5c1c-11ea-a6e3-ec5c6826bca3 | 127.0.0.1   |       13000 | ONLINE       | PRIMARY     | 8.0.18         |
| group_replication_applier | f24690f4-5c1c-11ea-837b-ec5c6826bca3 | 127.0.0.1   |       13001 | UNREACHABLE  | SECONDARY   | 8.0.18         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)

6. restart 13001, connect 13001, try start mgr

mysql> select * from performance_schema.replication_group_members;
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier |           |             |        NULL | OFFLINE      |             |                |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+
1 row in set (0.01 sec)

mysql> start group_replication;                                                                                                                                                                             ERROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log.

7. check the error log of 13001:

2020-05-26T03:28:32.928166Z 10 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-05-26T03:28:33.070739Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 127.0.0.1:33063 on local port: 33062.'
2020-05-26T03:28:33.071057Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 127.0.0.1:33063 on local port: 33062.'
2020-05-26T03:28:33.071185Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 127.0.0.1:33063 on local port: 33062.'
2020-05-26T03:28:33.071297Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 127.0.0.1:33063 on local port: 33062.'
2020-05-26T03:28:33.071404Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 127.0.0.1:33063 on local port: 33062.'
2020-05-26T03:28:33.071509Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 127.0.0.1:33063 on local port: 33062.'
2020-05-26T03:28:33.071625Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 127.0.0.1:33063 on local port: 33062.'
2020-05-26T03:28:33.071733Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 127.0.0.1:33063 on local port: 33062.'
2020-05-26T03:28:33.071849Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 127.0.0.1:33063 on local port: 33062.'
2020-05-26T03:28:33.071955Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 127.0.0.1:33063 on local port: 33062.'
2020-05-26T03:28:33.071963Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error connecting to all peers. Member join failed. Local port: 33062'
2020-05-26T03:28:33.106686Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 33062'
2020-05-26T03:29:32.989751Z 8 [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2020-05-26T03:29:32.989893Z 8 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'

8. restart 13002, connect 13002 and try start mgr

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 39bdb418-5c1d-11ea-b124-ec5c6826bca3 | 127.0.0.1   |       13002 | OFFLINE      |             |                |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
1 row in set (0.01 sec)

mysql> start group_replication;
ERROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log.

9. check error log of 13002:

2020-05-26T03:54:05.112327Z 16 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-05-26T03:54:35.254048Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Timeout while waiting for the group communication engine to be ready!'
2020-05-26T03:54:35.254160Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The group communication engine is not ready for the member to join. Local port: 33063'
2020-05-26T03:54:35.327849Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 33063'
2020-05-26T03:55:05.170743Z 8 [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2020-05-26T03:55:05.170884Z 8 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'
[26 May 4:04] phoenix Zhang
The configure file of the 3 server instance.

$ cat my13000.cnf 
[mysqld]
basedir = /usr/local/mysql-8.0.18-linux-glibc2.12-x86_64
datadir = /usr/local/mysql-8.0.18-linux-glibc2.12-x86_64/data13000
socket=/usr/local/mysql-8.0.18-linux-glibc2.12-x86_64/data13000/mysql.sock
port = 13000
log-bin=                    server-binary-log
relay-log=                  server-relay-log

binlog-checksum=            NONE
enforce-gtid-consistency
gtid-mode=                  on  

report-host=                127.0.0.1
report-user=                root

master-retry-count=         10  
skip-slave-start

## mgr config
loose-group_replication_start_on_boot= OFF 
loose-group_replication_single_primary_mode= ON
loose-group_replication_enforce_update_everywhere_checks= FALSE
loose-group_replication_recovery_get_public_key= TRUE
loose-group_replication_exit_state_action= READ_ONLY
loose-group_replication_consistency= BEFORE_AND_AFTER
loose-group_replication_local_address=127.0.0.1:33061
loose-group_replication_group_seeds=127.0.0.1:33061,127.0.0.1:33062,127.0.0.1:33063
loose-group_replication_group_name=aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa

$ cat my13001.cnf 
[mysqld]
basedir = /usr/local/mysql-8.0.18-linux-glibc2.12-x86_64
datadir = /usr/local/mysql-8.0.18-linux-glibc2.12-x86_64/data13001
socket=/usr/local/mysql-8.0.18-linux-glibc2.12-x86_64/data13001/mysql.sock
port = 13001
log-bin=                    server-binary-log
relay-log=                  server-relay-log

binlog-checksum=            NONE
enforce-gtid-consistency
gtid-mode=                  on  

report-host=                127.0.0.1
report-user=                root

master-retry-count=         10  
skip-slave-start

## mgr config
loose-group_replication_start_on_boot= OFF 
loose-group_replication_single_primary_mode= ON
loose-group_replication_enforce_update_everywhere_checks= FALSE
loose-group_replication_recovery_get_public_key= TRUE
loose-group_replication_exit_state_action= READ_ONLY
loose-group_replication_consistency= BEFORE_AND_AFTER
loose-group_replication_local_address=127.0.0.1:33062
loose-group_replication_group_seeds=127.0.0.1:33061,127.0.0.1:33062,127.0.0.1:33063
loose-group_replication_group_name=aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa

$ cat my13002.cnf 
[mysqld]
basedir = /usr/local/mysql-8.0.18-linux-glibc2.12-x86_64
datadir = /usr/local/mysql-8.0.18-linux-glibc2.12-x86_64/data13002
socket=/usr/local/mysql-8.0.18-linux-glibc2.12-x86_64/data13002/mysql.sock
port = 13002
log-bin=                    server-binary-log
relay-log=                  server-relay-log

binlog-checksum=            NONE
enforce-gtid-consistency
gtid-mode=                  on  

report-host=                127.0.0.1
report-user=                root

master-retry-count=         10  
skip-slave-start

## mgr config
loose-group_replication_start_on_boot= OFF 
loose-group_replication_single_primary_mode= ON
loose-group_replication_enforce_update_everywhere_checks= FALSE
loose-group_replication_recovery_get_public_key= TRUE
loose-group_replication_exit_state_action= READ_ONLY
loose-group_replication_consistency= BEFORE_AND_AFTER
loose-group_replication_local_address=127.0.0.1:33063
loose-group_replication_group_seeds=127.0.0.1:33061,127.0.0.1:33062,127.0.0.1:33063
loose-group_replication_group_name=aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
[26 May 5:45] phoenix Zhang
This is very unfriendly for user.
[3 Jun 2:17] MySQL Verification Team
Hi,

Thank you for your report. Verified as described.