Bug #83128 Group Replication depends on resolvable hostnames without domain
Submitted: 23 Sep 2016 11:04 Modified: 24 Jan 2017 14:30
Reporter: Thomas Lobker Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:0.9.0-labs OS:Linux
Assigned to: David Moss CPU Architecture:Any
Tags: bind-address, dns, domain, host, hostname, master_host

[23 Sep 2016 11:04] Thomas Lobker
Description:
Multiple servers in Group Replication cannot find each other if the local hostname of other peers is not resolvable by the local peer. By default this hostname (the hostname system variable) is not an fqdn. There is currently no way of changing this hostname in MySQL to something resolvable, because it's a read-only system variable).

How to repeat:
1. Install three fresh servers with Ubuntu

2. Use the default hostname 'ubuntu' for each server with a unique domain:

- ubuntu.domain1.local
- ubuntu.domain2.local
- ubuntu.domain3.local

3. Install MySQL 5.7.15 with Group Replication

4. Configuration Group Replication, bootstrap the first server and try to add the second server to the group.

== Expected result ==

The second server (slave) should connect to the master on a hostname or IP where the server is listening and start replication. The member should be added to the group. The member state should be RECOVERING and change to ONLINE after initial replication is done.

== Actual result ==

The second server (slave) is resolving the hostname of the master and is trying to connect to that hostname. This fails because the hostname 'ubuntu' is resolving to the local server. The member is actually added to the group because it will use the IP addresses from the group_replication_group_seeds variable. The member state is RECOVERING for a while and then the member disappears.

Of course my example on how to repeat is not a real world example, but we actually have many cases where it is not possible to have the hostname of the masters resolvable on a slave.

Suggested fix:
1. Make the replication address configurable

2. Use the system fqdn (hostname -f) as the 'hostname' global variable

== Workaround ==

Fool the operating system by using the fqdn as hostname

- Edit /etc/hosts and change hostname to fqdn
- Edit /etc/hostname and change hostname to fqdn
- Run hostname <fqdn>

This workaround could be scripted in the MySQL systemd startup script.
[24 Sep 2016 18:40] Nuno Carvalho
Hi Thomas,

Thank you for reporting the bug.

It is highly recommended that operating systems running MySQL, or any other network connected application/server, must have properly configured hostnames, either on DNS or local settings, like you mentioned.

Having said that, you can configure which hostname will be externalized by server on group and asynchronous replication.
For that we can set --report-host option:
http://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html#option_mysqld_report...

Please let me know if you still have issues while using that option.

Best regards,
Nuno Carvalho
[25 Sep 2016 8:55] Thomas Lobker
Hi Nuno,

Thanks for your quick response. I can confirm that your solution works, thank you so much. I'm not sure if it qualifies as a bug, or just needs some additional documentation. I will leave this up to you.

Result with suggested solution:

mysql> SHOW VARIABLES LIKE 'hostname';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| hostname      | data1 |
+---------------+-------+
1 row in set (0.00 sec)

mysql> SHOW VARIABLES LIKE 'report_host';
+---------------+---------------------+
| Variable_name | Value               |
+---------------+---------------------+
| report_host   | data1.domain3.local |
+---------------+---------------------+
1 row in set (0.00 sec)

mysql> CHANGE MASTER TO MASTER_USER='replication', MASTER_PASSWORD='password' FOR CHANNEL 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.02 sec)

mysql> START GROUP_REPLICATION;
Query OK, 0 rows affected (2.19 sec)

mysql> SELECT * FROM `performance_schema`.`replication_group_members`\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
   MEMBER_ID: ****
 MEMBER_HOST: data1.domain3.local
 MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
   MEMBER_ID: ****
 MEMBER_HOST: data1.domain2.local
 MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
*************************** 3. row ***************************
CHANNEL_NAME: group_replication_applier
   MEMBER_ID: ****
 MEMBER_HOST: data1.domain1.local
 MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
3 rows in set (0.00 sec)
[25 Sep 2016 17:45] Nuno Carvalho
Hi Thomas,

I will classify this as a documentation bug to ensure that this information goes into manual.

Best regards,
Nuno Carvalho
[24 Jan 2017 14:35] David Moss
Posted by developer:
 
Thanks for your feedback. The following page was updated:
https://dev.mysql.com/doc/refman/5.7/en/group-replication-user-credentials.html

and this note was added to the GR documentation:
Similarly, if the member cannot correctly identify the other members via the server's hostname the recovery process can fail. It is recommended that operating systems running MySQL have a properly configured unique hostname, either using DNS or local settings. This hostname can be verified in the Member_host column of the performance_schema.replication_group_members table. If multiple group members externalize a default hostname set by the operating system, there is a chance of the member not resolving to the correct member address and not being able to join the group. In such a situation use report_host to configure a unique hostname to be externalized by each of the servers.