Bug #90534 InnoDB Cluster members stop by failed start of Group Replication
Submitted: 20 Apr 13:58 Modified: 20 Jun 14:12
Reporter: Ryusuke Kajiyama Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S1 (Critical)
Version:8.0.11, 8.0.12 OS:Mac OS X (10.12.6)
Assigned to: CPU Architecture:x86

[20 Apr 13:58] Ryusuke Kajiyama
Description:
On macOS Sierra version 10.12.6, MySQL InnoDB Cluster members stop due to failed to start Group Replication.
This does NOT happen on macOS High Sierra 10.13.x

[Software packages from mysql.com]
mysql-8.0.11-macos10.13-x86_64
mysql-shell-8.0.11-macos10.13-x86-64bit
mysql-router-8.0.11-macos10.13-x86-64bit

On adding instances into Cluster, new instances out puts errors
====
[ERROR] [MY-011685] [Repl] Plugin group_replication reported: 'The group name option is mandatory'
[ERROR] [MY-011660] [Repl] Plugin group_replication reported: 'Unable to start Group Replication on boot'
[Warning] [MY-010604] [Repl] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=sakila-relay-bin' to avoid this problem.
[System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
[Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
[System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
[System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='sakila.local', master_port= 3310, master_log_file='', master_log_pos= 4, master_bind=''.
[Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
[ERROR] [MY-010584] [Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master 'mysql_innodb_cluster_r2418734901@sakila.local:3310' - retry-time: 60  retries: 1, Error_code: MY-002005
[ERROR] [MY-011582] [Repl] Plugin group_replication reported: 'There was an error when connecting to the donor server. Please check that group_replication_recovery channel credentials and all MEMBER_HOST column values of performance_schema.replication_group_members table are correct and DNS resolvable.'
[ERROR] [MY-011583] [Repl] Plugin group_replication reported: 'For details please check performance_schema.replication_connection_status table and error log messages of Slave I/O for channel group_replication_recovery.'
====

Then it retries to connect master 10 times and all failed, then finally server stops.
====
[ERROR] [MY-013173] [Repl] Plugin group_replication reported: 'The plugin encountered a critical error and will abort: Fatal error during execution of Group Replication'
mysqld got signal 6 ;
====

How to repeat:
set PATH of bin of MySQL Sever, Shell and Router

in MySQL Shell
====
dba.deploySandboxInstance(3310)
dba.deploySandboxInstance(3320)

\connect root@localhost:3310

cluster = dba.createCluster("myCluster")

cluster.addInstance("root@localhost:3320")
====

Then wait until all retry failed (10 min+)
[20 Apr 13:59] Ryusuke Kajiyama
Error log os slave

Attachment: 3330_error.log (application/octet-stream, text), 21.46 KiB.

[20 Apr 14:00] Ryusuke Kajiyama
performance_schema.replication_connection_status

Attachment: 3330_replication_connection_status.txt (text/plain), 2.99 KiB.

[22 May 14:27] Bogdan Kecman
Hi,

this looks like macos bug and not mysql bug, since upgrading macos solves the problem.

all best
Bogdan
[27 May 6:25] Ryusuke Kajiyama
This issue was observed on non-macOS evn. Gathering env details.
That issue was caused by network config. Modifying /etc/hosts file solved the issue in that case.

There is potential difference how to refer Master node on cluster.addInstance().
At the first step, Slave could could find Master node to gather Group info.
However, at the second step to join that Group, it fails to solve host name of Master or some other issues.
[28 May 1:08] Lu Yang
I have a similar case as it is. The MySQL version is 8.0.11 commercial, the OS version is oracle linux 6.

When I tried the standard procedure to create Innodb Cluster by use "dba.configureInstance()".

But it always failed to start group replication with errors. 

The error log for the first GR node:

2018-05-17T10:03:46.127982Z 9 [System] [MY-011086] [Server] Received RESTART from user dbusr.  Restarting mysqld (Version: 8.0.11-commercial).
2018-05-17T10:03:50.027182Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.11-commercial)  MySQL Enterprise Server - Commercial.
2018-05-17T10:03:51.089559Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.11-commercial) starting as process 3972
2018-05-17T10:03:54.147295Z 0 [ERROR] [MY-011685] [Repl] Plugin group_replication reported: 'The group name option is mandatory'
2018-05-17T10:03:54.147506Z 0 [ERROR] [MY-011660] [Repl] Plugin group_replication reported: 'Unable to start Group Replication on boot'
2018-05-17T10:03:54.575377Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2018-05-17T10:03:54.822402Z 0 [Warning] [MY-010604] [Repl] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=mysql8_2-relay-bin' to avoid this problem.
2018-05-17T10:03:54.971222Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.11-commercial'  socket: '/var/lib/mysql/mysql.sock' port: 3306  MySQL Enterprise Server - Commercial.

The err log for the second added instance with "dba.addInstance()":

2018-05-18T01:21:13.377088Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.11-commercial'  socket: '/var/lib/mysql/mysql.sock' port: 3306  MySQL Enterprise Server - Commercial.
2018-05-18T01:21:40.678977Z 10 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2018-05-18T01:21:40.967606Z 10 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
2018-05-18T01:21:41.131759Z 12 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2018-05-18T01:21:47.361416Z 18 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='mysql8_2', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2018-05-18T01:21:47.523838Z 21 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2018-05-18T01:21:47.547983Z 21 [ERROR] [MY-010584] [Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master 'mysql_innodb_cluster_r0435278642@mysql8_2:3306' - retry-time: 60  retries: 1, Error_code: MY-002005
2018-05-18T01:21:47.631842Z 18 [ERROR] [MY-011582] [Repl] Plugin group_replication reported: 'There was an error when connecting to the donor server. Please check that group_replication_recovery channel credentials and all MEMBER_HOST column values of performance_schema.replication_group_members table are correct and DNS resolvable.'
2018-05-18T01:21:47.631965Z 18 [ERROR] [MY-011583] [Repl] Plugin group_replication reported: 'For details please check performance_schema.replication_connection_status table and error log messages of Slave I/O for channel group_replication_recovery.'
2018-05-18T01:22:48.137519Z 18 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='mysql8_2', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='mysql8_2', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2018-05-18T01:22:48.271591Z 23 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2018-05-18T01:22:48.272448Z 23 [ERROR] [MY-010584] [Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master 'mysql_innodb_cluster_r0435278642@mysql8_2:3306' - retry-time: 60  retries: 1, Error_code: MY-002005
2018-05-18T01:22:48.347957Z 18 [ERROR] [MY-011582] [Repl] Plugin group_replication reported: 'There was an error when connecting to the donor server. Please check that group_replication_recovery channel credentials and all MEMBER_HOST column values of performance_schema.replication_group_members table are correct and DNS resolvable.'
2018-05-18T01:22:48.348067Z 18 [ERROR] [MY-011583] [Repl] Plugin group_replication reported: 'For details please check performance_schema.replication_connection_status table and error log messages of Slave I/O for channel group_replication_recovery.'

The problem was resolved by changed the following hosts configuration.

Before, with errors:

[root@mysql8_2 mysql]# cat /etc/sysconfig/network
NETWORKING=yes
*HOSTNAME=mysql8_2*

[root@mysql8_2 mysql]# cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost localhost.localdomain
::1         localhost localhost.localdomain localhost localhost.localdomain
192.168.56.102  ic1 ic1.localdomain
*192.168.56.103  ic2 ic2.localdomain*
192.168.56.104  ic3 ic3.localdomain

After:

[root@mysql8_2 mysql]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=mysql8_2
[root@mysql8_2 mysql]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost localhost.localdomain
::1         localhost localhost.localdomain localhost localhost.localdomain
192.168.56.102  mysql8_1 mysql8_1.localdomain
192.168.56.103  mysql8_2 mysql8_2.localdomain
192.168.56.104  mysql8_3 mysql8_3.localdomain
[20 Jun 14:12] Umesh Shastry
Thank you for the report and feedback.
This can be triggered even on CentOS6 after making changes to network configuration as suggested by Lu and Kajiyama-San.

regards,
Umesh
[25 Jul 14:44] John Doe
I can confirm this bug on Windows Server 2012 R2 with the x64 kit.
I used two servers on the IPs 192.168.9.8 (aka Server1) & 192.168.9.9 (aka Server2) both configured from scratch on port 3308 with the user root_cluster. For clustering I intended to add another one but the problem appeared. The cluster is successfully created on .9.9 on port 3308 and when I try to add the instance .9.8 on port 3308 errors pop up regarding the plugin group_replication.

===Software packages from mysql.com
mysql-installer-community-8.0.11.0.msi

===To reproduce:
++Install the MySQL Installer 8.0.11 RC
++From the installer deploy MySQL Server 8.0.11 on port 3308

++Create a local user for cluster operations 'root_cluster'
	CREATE USER 'root_cluster'@'%' IDENTIFIED BY '1';
	GRANT ALL PRIVILEGES ON mysql_innodb_cluster_metadata.* TO root_cluster@'%' WITH GRANT OPTION;
	GRANT ALL PRIVILEGES ON mysql.* TO root_cluster@'%' WITH GRANT OPTION;
	GRANT RELOAD, SHUTDOWN, PROCESS, FILE, SUPER, REPLICATION SLAVE, REPLICATION CLIENT, CREATE USER ON *.* TO root_cluster@'%' WITH GRANT OPTION;
	GRANT SELECT, SYSTEM_VARIABLES_ADMIN, PERSIST_RO_VARIABLES_ADMIN ON *.* TO root_cluster@'%' WITH GRANT OPTION;

MySQL JS> \c root_cluster@192.168.9.9:3308
MySQL [192.168.9.9 ssl] JS> dba.configureInstance('root_cluster@192.168.9.8:3308')
	Configuring MySQL instance at 192.168.9.8:3308 for use in an InnoDB cluster...

	This instance reports its own address as Server1
	Clients and other cluster members will communicate with it through this address by default. If this is not correct, the report_host MySQL system variable should be changed.

	Some configuration options need to be fixed:
	+--------------------------+---------------+----------------+--------------------------------------------------+
	| Variable                 | Current Value | Required Value | Note                               			   |
	+--------------------------+---------------+----------------+--------------------------------------------------+
	| binlog_checksum          | CRC32         | NONE           | Update the servervariable                        |
	| enforce_gtid_consistency | OFF           | ON             | Update read-only variable and restart the server |
	| gtid_mode                | OFF           | ON             | Update read-only variable and restart the server |
	+--------------------------+---------------+----------------+--------------------------------------------------+

	Do you want to perform the required configuration changes? [y/n]: y
	Do you want to restart the instance after configuring it? [y/n]: y
	Configuring instance...
	The instance '192.168.9.8:3308' was configured for cluster usage.
	Restarting MySQL...
	MySQL server at 192.168.9.8:3308 was restarted.

MySQL [192.168.9.9 ssl] JS> cluster.addInstance('root_cluster@192.168.9.8:3308')
	A new instance will be added to the InnoDB cluster. Depending on the amount of data on the cluster this might take from a few seconds to several hours.

	Adding instance to the cluster ...

	Validating instance at 192.168.9.8:3308...

	This instance reports its own address as Server1

	Instance configuration is suitable.
	Cluster.addInstance: WARNING: The given '192.168.9.8:3308' and the peer 'Server2:3308' have duplicated server_id 1
	ERROR: Error joining instance to cluster: The operation could not continue due to the following requirements not being met:
	Some active options on server '192.168.9.8:3308' are incompatible with Group Replication.
	A new server_id with value 1816645590 was persisted.
	Please restart the server '192.168.9.8:3308' to load the newly persisted configuration options and try again.

	The server_id 1 is already used by peer 'Server2:3308'
	The server_id must be different from the ones in use by the members of the GR group.
	Option name                      Required Value   Current Value    Result
	-------------------------------  ---------------  ---------------  -----
	server_id                        <unique ID>      1                FAIL  (RuntimeError)
[25 Jul 14:45] John Doe
++Restart the MySQL80 service on .9.8

MySQL [192.168.9.9 ssl] JS> cluster.addInstance('root_cluster@192.168.9.8:3308')
	A new instance will be added to the InnoDB cluster. Depending on the amount of data on the cluster this might take from a few seconds to several hours.

	Adding instance to the cluster ...

	Validating instance at 192.168.9.8:3308...

	This instance reports its own address as Server1

	Instance configuration is suitable.
	Cluster.addInstance: WARNING: Not running locally on the server and can not access its error log.
	ERROR:
	Group Replication join failed.
	ERROR: Error joining instance to cluster: '192.168.9.8:3308' - Query failed. MySQL Error (3092): ClassicSession.query: The server is not configured properly to be an active member of the group. Please see more details on error log.. Query: START group_replication: MySQL Error (3092): ClassicSession.query: The server is not configured properly to be an active member of the group. Please see more details on error log. (RuntimeError)

MySQL [192.168.9.9 ssl] JS> cluster.checkInstanceState('root_cluster@192.168.9.8:3308')
	Analyzing the instance replication state...

	The instance 'root_cluster@192.168.9.8:3308' is valid for the cluster.
	The instance is new to Group Replication.
	
	{
		"reason": "new",
		"state": "ok"
	}

MySQL [192.168.9.9 ssl] JS> dba.checkInstanceConfiguration('root_cluster@192.168.9.8:3308')
	Validating MySQL instance at 192.168.9.8:3308 for use in an InnoDB cluster...
	
	This instance reports its own address as Server1
	Clients and other cluster members will communicate with it through this address by default. If this is not correct, the report_host MySQL system variable should be changed.
	
	Checking whether existing tables comply with Group Replication requirements...
	No incompatible tables detected
	
	Checking instance configuration...
	Instance configuration is compatible with InnoDB cluster
	
	The instance '192.168.9.8:3308' is valid for InnoDB cluster usage.
	
	{
		"status": "ok"
	}

===ERRORs in the log file on .9.8
	ERROR: Error joining instance to cluster: '192.168.9.241:3308' - Query failed. MySQL Error (3092): ClassicSession.query: The server is not configured properly to be an active member of the group. Please see more details on error log.. Query: START group_replication: MySQL Error (3092): ClassicSession.query: The server is not configured properly to be an active member of the group. Please see more details on error log. (RuntimeError)

	2018-07-25T14:07:52.622071Z 10 [ERROR] [MY-011685] [Repl] Plugin group_replication reported: 'The group name option is mandatory'
	2018-07-25T14:07:52.622609Z 10 [ERROR] [MY-011660] [Repl] Plugin group_replication reported: 'Unable to start Group Replication on boot'
	2018-07-25T14:07:57.824576Z 10 [Warning] [MY-010604] [Repl] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=Server1-relay-bin' to avoid this problem.
	2018-07-25T14:07:57.916316Z 10 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
	2018-07-25T14:07:58.039676Z 10 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
	2018-07-25T14:07:58.182367Z 12 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
	2018-07-25T14:08:09.115451Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 192.168.9.9:33081 on local port: 33081.'
	2018-07-25T14:08:19.115379Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 192.168.9.9:33081 on local port: 33081.'
	2018-07-25T14:08:29.115338Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 192.168.9.9:33081 on local port: 33081.'
	2018-07-25T14:08:39.115321Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 192.168.9.9:33081 on local port: 33081.'
	2018-07-25T14:08:49.119464Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 192.168.9.9:33081 on local port: 33081.'
	2018-07-25T14:08:58.384944Z 10 [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group'
	2018-07-25T14:08:58.385580Z 10 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.'
	2018-07-25T14:08:59.119427Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 192.168.9.9:33081 on local port: 33081.'
	2018-07-25T14:09:09.119380Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 192.168.9.9:33081 on local port: 33081.'
	2018-07-25T14:09:19.119322Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 192.168.9.9:33081 on local port: 33081.'
	2018-07-25T14:09:29.119270Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 192.168.9.9:33081 on local port: 33081.'
	2018-07-25T14:09:39.119251Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to 192.168.9.9:33081 on local port: 33081.'
	2018-07-25T14:09:39.119949Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error connecting to all peers. Member join failed. Local port: 33081'
	2018-07-25T14:09:39.120831Z 0 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] read failed'
	2018-07-25T14:09:39.136720Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 33081'
[25 Jul 14:50] John Doe
Ignore this error line from the above (pasted by mistake):
ERROR: Error joining instance to cluster: '192.168.9.241:3308' - Query failed. MySQL Error (3092): ClassicSession.query: The server is not configured properly to be an active member of the group. Please see more details on error log.. Query: START group_replication: MySQL Error (3092): ClassicSession.query: The server is not configured properly to be an active member of the group. Please see more details on error log. (RuntimeError)
[11 Aug 7:06] Ryusuke Kajiyama
This issue happens with MySQL 8.0.12 Community too.