Bug #91189 InnoDB cluster reboot ignores group_replication_local_address from conf file
Submitted: 8 Jun 2018 16:33 Modified: 22 Jun 2018 18:20
Reporter: Shubhra Prakash Nandi Email Updates:
Status: Can't repeat Impact on me:
None 
Category:Shell AdminAPI InnoDB Cluster / ReplicaSet Severity:S2 (Serious)
Version:mysql-5.7.22, mysql-shell- 8.0.11 OS:Debian (9)
Assigned to: MySQL Verification Team CPU Architecture:x86 (amd64)

[8 Jun 2018 16:33] Shubhra Prakash Nandi
Description:
Hello,

This issue was not present till mysql 5.7.20, but once I upgraded to mysql 5.7.22 I am facing this issue that when I reboot InnoDB cluster from complete outage from mysqlsh using dba.rebootClusterFromCompleteOutage, mysql is ignoring group_replication_local_address=host:port from my.cnf and using a random port for group communication. This can be a pretty big issue in a busy server, since other instances then are not able to connect to the new primary server since the seed hosts definition in their my.cnf still refer to the old host:port.

This is serious impediment in quick restore of the whole cluster incase of an emergency reboot of the cluster. Need some speedy workaround / resolution. Thanks. 

How to repeat:
1. Setup an InnoDB cluster with 3 nodes.

2. Save the InnoDB cluster configuration using dba.configureLocalInstance for each instance.

3. Shutdown each instance.

4. Start the last RW instance and use dba.rebootClusterFromCompleteOutage to bring up the cluster.

5. Start other instances.

The RW instance start with a group replication local address different from one saved in my.cnf. Other instances cannot join the cluster.
[18 Jun 2018 10:48] MySQL Verification Team
Hi,

I'm missing some info here as I can't reproduce this with 5.7.22 ?

all best
Bogdan
[18 Jun 2018 16:10] Shubhra Prakash Nandi
Ok, I think I missed one step. Once the cluster is up, shutdown all the instances. Change the group_replication_local_address from 33061 to something else like 13306 in all instances and change group_replication_group_seeds accordingly. Now start all instances and use Mysql shell command dba.rebootClusterFromCompleteOutage to bring up the cluster again. Below is the relevant config and how I brought up the cluster to find local address has been reset to default (33061) but on a live cluster I have seen it to take a random port may be because I am not using seed nodes in the primary instance here.

group_replication_group_seeds
group_replication_gtid_assignment_block_size = 1000000
group_replication_ip_whitelist = 192.168.1.10,192.168.1.20,192.168.1.30
group_replication_local_address = node1:13306

root@node1:~# mysqlsh --log-level=8
MySQL Shell 8.0.11

Copyright (c) 2016, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type '\help' or '\?' for help; '\quit' to exit.

 MySQL  JS > \connect ca@node1:3306
Creating a session to 'ca@node1:3306'
Enter password: ************
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 12
Server version: 5.7.22-log MySQL Community Server (GPL)
No default schema selected; type \use <schema> to set one.

 MySQL  node1:3306 ssl  JS > var cluster = dba.rebootClusterFromCompleteOutage('my_cluster')
Reconfiguring the cluster 'my_cluster' from complete outage...

The instance 'node2:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

The instance 'node3:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

WARNING: On instance 'node1:3306' membership change cannot be persisted since MySQL version 5.7.22 does not support the SET PERSIST command (MySQL version >= 8.0.5 required). Please use the <Dba>.configureLocalInstance command locally to persist the changes.

The cluster was successfully rebooted.

 MySQL  node1:3306 ssl  JS > cluster.status()
{
    "clusterName": "my_cluster",
    "defaultReplicaSet": {
        "name": "default",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "node1:3306": {
                "address": "node1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "node2:3306": {
                "address": "node2:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "node3:3306": {
                "address": "node3:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    },
    "groupInformationSourceMember": "mysql://ca@node1:3306"
}

 MySQL  node1:3306 ssl  JS > \q
Bye!
root@node1:~#
root@node1:~#
root@node1:~# netstat -lnt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:33061           0.0.0.0:*               LISTEN
tcp6       0      0 :::22                   :::*                    LISTEN
tcp6       0      0 :::3306                 :::*                    LISTEN
root@node1:~#
[22 Jun 2018 17:08] MySQL Verification Team
Hi,

I don't see a problem here, you got the cluster up as expected? What part of this you consider a bug. I don't see any random ports here?

all best
Bogdan
[22 Jun 2018 17:24] Shubhra Prakash Nandi
When I had setup the cluster, I had specified GR local address port as 13306 (please see earlier comment), when I rebooted the cluster after migrating to Mysql 5.7.22, the GR local address port changed to 33061 instead of 13306 (see netstat output in earlier comment). When I reboot the cluster other nodes may be down which are unable to join then but the seed addresses in them have 13306 as the port, so when I restart them they fail to join. So what I consider as bug is InnoDB cluster should not override the GR local address port when cluster is rebooted from Mysql shell. This makes recovery much faster.

I am not able to replicate the randomness of port here though the port definitely changes as you can see from my earlier comment but if your cluster is multi-master and you are using SSL for GR and for GR recovery then you should be able to replicate it. I have faced this over 3-4 times in a production environment.
[22 Jun 2018 17:28] MySQL Verification Team
sorry, I see it now, but I can't reproduce this? You get different port every time?
[22 Jun 2018 18:20] Shubhra Prakash Nandi
In this case I donot get a different port everytime, but I do get a different port than what is present as GR local address in the conf file. I have not tried this on a fresh install of Mysql 5.7.22 and creating the cluster after installing it, but when I had created the cluster in Mysql 5.7.20 and then upgraded Mysql to 5.7.22 I am able to get this issue consistently. On the live server I am running this does take a random port always when I reboot the cluster.