Bug #93004 MGR failed to boot in VPC(ECS) when using public ip
Submitted: 30 Oct 2018 7:24 Modified: 2 Jul 2019 1:09
Reporter: Zhenghu Wen (OCA) Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S4 (Feature request)
Version:5.7.20 OS:Linux
Assigned to: MySQL Verification Team CPU Architecture:Any

[30 Oct 2018 7:24] Zhenghu Wen
Description:
MGR xcom will:
1. check the group_replication_local_address parameter in 
bool
is_parameters_syntax_correct(const Gcs_interface_parameters &interface_params)
by using 
get_ipv4_local_addresses(std::map<std::string, int>& addr_to_cidr_bits,
                         bool filter_out_inactive)

2. set site_def->nodeno in 
void site_install_action(site_def *site, cargo_type operation) by using
node_no xcom_find_node_index(node_list *nodes)

3. check if a server is itself in 
static server *
mksrv(char *srv, xcom_port port)
by using 
node_no	xcom_mynode_match(char *name, xcom_port port)

all of three will get server ipaddress by using 
static int init_sock_probe(sock_probe *s)

but in cloud, a vpc ecs init_sock_probe() can only get a loop ip and a private ip, it could not get the public ip .

so if we using the public ip when setup multi mgr node cross vpc. it will failed.

How to repeat:
1. create a vpc ecs in aliyun or others with public ip (it is default option).

mg-node0
公网 IP:59.111.148.50私有网 IP:192.168.129.143

root@mg-node0:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:2e:50:70 brd ff:ff:ff:ff:ff:ff
    inet 192.168.129.143/17 brd 192.168.255.255 scope global eth0
    inet6 fe80::f816:3eff:fe2e:5070/64 scope link 
       valid_lft forever preferred_lft forever

but it indeed has a public ip address.
 
2. boot mgr by seting group_replication_local_address as the public ip 
loose-group_replication_local_address= "59.111.148.50:3307"

when start group_replication, it failed, log likes:
[ERROR] Plugin group_replication reported: '[GCS] There is no local IP address matching the one configured for the local node (59.111.148.50:3307).'
[5 Nov 2018 1:10] Zhenghu Wen
Hi Bogdan:
if it could be verified in your environment?
[5 Nov 2018 1:43] MySQL Verification Team
Hi,

I do not have access to aliyun and on a normal setup I can't reproduce this.

Now if you look at your ip addr output there is no public ip there so I'm not sure if I can call this a bug..  If you are running the GR inside a cloud where your boxes have private ip's and are seen trough 1:1 nat from the world I believe they would have to be configured to use the private ip's not public ones as they don't see public ones properly.

all best
Bogdan
[5 Nov 2018 3:33] Zhenghu Wen
may be it could be a feature request
[5 Nov 2018 14:56] MySQL Verification Team
Hi,

I'm not sure FR makes sense either.

You have a system where N mysql servers have private IP's and they see each other with/trough those private IP's. Then you have those private IP's 1:1 natted to the real world. You want to configure your Innodb cluster to use those private ip's and that works. I don't see where does the real IP plays a role here since it's not visible by any of the servers.

all best
Bogdan
[2 Jul 2019 1:09] Zhenghu Wen
same as https://bugs.mysql.com/bug.php?id=92665
[9 Feb 2023 14:33] MySQL Verification Team
Solution from Bug#109996 should fix this bug too