Bug #97266 GR fail to start,conflict with k8s CNI(flannel)
Submitted: 17 Oct 2019 2:39 Modified: 25 Oct 2019 9:23
Reporter: weston lee Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S1 (Critical)
Version:5.7.28 OS:CentOS (aliyun ECS CentOS 7.4)
Assigned to: CPU Architecture:x86

[17 Oct 2019 2:39] weston lee
Description:
ifconfig
--------------------------------------------------------------------------------
cni0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 10.222.30.1  netmask 255.255.255.0  broadcast 0.0.0.0
        ether 5e:31:9d:66:bd:f8  txqueuelen 1000  (Ethernet)
        RX packets 6776  bytes 12565625 (11.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7206  bytes 4767256 (4.5 MiB)
        TX errors 0
		
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.24.0.61  netmask 255.255.255.0  broadcast 172.24.0.255
        ether 00:16:3e:11:9c:10  txqueuelen 1000  (Ethernet)
        RX packets 362096646  bytes 482421352506 (449.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 67049446  bytes 11734902488 (10.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 1475791  bytes 3737124512 (3.4 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1475791  bytes 3737124512 (3.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
--------------------------------------------------------------------------------
MySQL server crashes when starting GR.

error log
--------------------------------------------------------------------------------
*** Error in `/usr/sbin/mysqld': double free or corruption (!prev): 0x00007f24e451e2f0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81499)[0x7f29abb1a499]
/usr/lib64/mysql/plugin/group_replication.so(_Z24get_ipv4_local_addressesRSt3mapISsiSt4lessISsESaISt4pairIKSsiEEEb+0xb16)[0x7f2565b64fc6]
/usr/lib64/mysql/plugin/group_replication.so(_Z32get_ipv4_local_private_addressesRSt3mapISsiSt4lessISsESaISt4pairIKSsiEEEb+0x5d)[0x7f2565b662bd]
/usr/lib64/mysql/plugin/group_replication.so(_Z21fix_parameters_syntaxR24Gcs_interface_parameters+0x117b)[0x7f2565b7715b]
/usr/lib64/mysql/plugin/group_replication.so(_ZN18Gcs_xcom_interface10initializeERK24Gcs_interface_parameters+0x2dc)[0x7f2565b905cc]
/usr/lib64/mysql/plugin/group_replication.so(_ZN14Gcs_operations9configureERK24Gcs_interface_parameters+0x9b)[0x7f2565ba6c3b]
/usr/lib64/mysql/plugin/group_replication.so(_Z29configure_group_communicationP23st_server_ssl_variables+0xccb)[0x7f2565bb7e1b]
/usr/lib64/mysql/plugin/group_replication.so(_Z26initialize_plugin_and_join25enum_plugin_con_isolationP29Delayed_initialization_thread+0x227)[0x7f2565bb8da7]
/usr/lib64/mysql/plugin/group_replication.so(_Z30plugin_group_replication_startv+0x5b1)[0x7f2565bb9501]
/usr/sbin/mysqld(_Z23group_replication_startv+0x93)[0xe35b93]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x2643)[0xccf753]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x3ad)[0xcd39bd]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0xa7d)[0xcd451d]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x19f)[0xcd5f1f]
/usr/sbin/mysqld(handle_connection+0x290)[0xd97dc0]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0x127fae4]
/lib64/libpthread.so.0(+0x7e25)[0x7f29ad0dde25]
/lib64/libc.so.6(clone+0x6d)[0x7f29abb97bad]

======= Memory map: ========
……

04:38:23 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=33554432
read_buffer_size=16777216
max_used_connections=81
max_threads=4000
thread_count=81
connection_count=81
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 131158299 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f24e41248a0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f2563fcfe30 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xf0768b]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x7b9311]
/lib64/libpthread.so.0(+0xf6d0)[0x7f29ad0e56d0]
/lib64/libc.so.6(gsignal+0x37)[0x7f29abacf277]
/lib64/libc.so.6(abort+0x148)[0x7f29abad0968]
/lib64/libc.so.6(+0x78d37)[0x7f29abb11d37]
/lib64/libc.so.6(+0x81499)[0x7f29abb1a499]
/usr/lib64/mysql/plugin/group_replication.so(_Z24get_ipv4_local_addressesRSt3mapISsiSt4lessISsESaISt4pairIKSsiEEEb+0xb16)[0x7f2565b64fc6]
/usr/lib64/mysql/plugin/group_replication.so(_Z32get_ipv4_local_private_addressesRSt3mapISsiSt4lessISsESaISt4pairIKSsiEEEb+0x5d)[0x7f2565b662bd]
/usr/lib64/mysql/plugin/group_replication.so(_Z21fix_parameters_syntaxR24Gcs_interface_parameters+0x117b)[0x7f2565b7715b]
/usr/lib64/mysql/plugin/group_replication.so(_ZN18Gcs_xcom_interface10initializeERK24Gcs_interface_parameters+0x2dc)[0x7f2565b905cc]
/usr/lib64/mysql/plugin/group_replication.so(_ZN14Gcs_operations9configureERK24Gcs_interface_parameters+0x9b)[0x7f2565ba6c3b]
/usr/lib64/mysql/plugin/group_replication.so(_Z29configure_group_communicationP23st_server_ssl_variables+0xccb)[0x7f2565bb7e1b]
/usr/lib64/mysql/plugin/group_replication.so(_Z26initialize_plugin_and_join25enum_plugin_con_isolationP29Delayed_initialization_thread+0x227)[0x7f2565bb8da7]
/usr/lib64/mysql/plugin/group_replication.so(_Z30plugin_group_replication_startv+0x5b1)[0x7f2565bb9501]
/usr/sbin/mysqld(_Z23group_replication_startv+0x93)[0xe35b93]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x2643)[0xccf753]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x3ad)[0xcd39bd]
/usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0xa7d)[0xcd451d]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x19f)[0xcd5f1f]
/usr/sbin/mysqld(handle_connection+0x290)[0xd97dc0]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0x127fae4]
/lib64/libpthread.so.0(+0x7e25)[0x7f29ad0dde25]
/lib64/libc.so.6(clone+0x6d)[0x7f29abb97bad]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f24e4435b50): start group_replication
Connection ID (thread ID): 323
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
--------------------------------------------------------------------------------

Solved the problem when uninstalling k8s and flannel

ifconfig
--------------------------------------------------------------------------------
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.24.0.61  netmask 255.255.255.0  broadcast 172.24.0.255
        ether 00:16:3e:11:9c:10  txqueuelen 1000  (Ethernet)
        RX packets 362096646  bytes 482421352506 (449.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 67049446  bytes 11734902488 (10.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 1475791  bytes 3737124512 (3.4 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1475791  bytes 3737124512 (3.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
--------------------------------------------------------------------------------

How to repeat:
1 Install k8s and flannel on aliyun ECS CentOS 7.4

2 Deploy cni0(Container Network Interface)with k8s and flannel

3 Initialize and start group_replication

Suggested fix:
The problem may be group_replication.so get_ipv4_local_addresses
[25 Oct 2019 9:23] MySQL Verification Team
Hi,

Thanks for the report. Verified. 

kind regards
[12 Dec 2019 12:31] Tiago Jorge
Thank you for your bug report.

Can you please provide the configuration (JSON or any other) of your overlay flannel network so that we can try and reproduce the problem in our environment?
[13 Sep 2021 10:19] oracle wang
I've encountered a similar bug, mysql gr works fine when stop and reset flannel.

Attachment: info.txt (text/plain), 47.49 KiB.

[13 Sep 2021 10:19] oracle wang
I've encountered a similar bug, mysql gr works fine when stop and reset flannel.

Attachment: info.txt (text/plain), 47.49 KiB.

[15 Sep 2021 9:08] oracle wang
Starting program: /usr/sbin/mysqld-debug --defaults-file=/data/mysql-3306/my.cnf --pid-file=/var/run/mysqld/mysqld-3306.pid --gdb
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffed75b700 (LWP 467930)]
[New Thread 0x7ffe43c49700 (LWP 468336)]
[New Thread 0x7ffe43448700 (LWP 468337)]
[New Thread 0x7ffe42c47700 (LWP 468338)]
[New Thread 0x7ffe42446700 (LWP 468339)]
[New Thread 0x7ffe41c45700 (LWP 468340)]
[New Thread 0x7ffe41444700 (LWP 468341)]
[New Thread 0x7ffe40c43700 (LWP 468342)]
[New Thread 0x7ffe40442700 (LWP 468343)]
[New Thread 0x7ffe3fc41700 (LWP 468344)]
[New Thread 0x7ffe3f440700 (LWP 468345)]
[New Thread 0x7ffe3ec3f700 (LWP 468346)]
[New Thread 0x7ffe37fff700 (LWP 468347)]
[New Thread 0x7ffe3e43e700 (LWP 468348)]
[New Thread 0x7ffe3dc3d700 (LWP 468349)]
[New Thread 0x7ffe3d216700 (LWP 468370)]
[Thread 0x7ffe3d216700 (LWP 468370) exited]
[New Thread 0x7ffe3d216700 (LWP 468638)]
[Thread 0x7ffe3d216700 (LWP 468638) exited]
[New Thread 0x7ffe3d216700 (LWP 468639)]
[New Thread 0x7ffe4d449700 (LWP 468640)]
[New Thread 0x7ffe4cc48700 (LWP 468641)]
[New Thread 0x7ffe4c447700 (LWP 468642)]
[New Thread 0x7ffe4bc46700 (LWP 468643)]
[New Thread 0x7ffe4b445700 (LWP 468644)]
[New Thread 0x7ffe4ac44700 (LWP 468645)]
[New Thread 0x7ffe4a443700 (LWP 468646)]
[New Thread 0x7ffe49c42700 (LWP 468647)]
[New Thread 0x7ffe49441700 (LWP 468648)]
[New Thread 0x7ffe48c40700 (LWP 468649)]
[New Thread 0x7ffe4843f700 (LWP 468650)]
[New Thread 0x7ffe4757c700 (LWP 468673)]
[New Thread 0x7ffe4753a700 (LWP 468901)]
[New Thread 0x7ffe3ca15700 (LWP 468902)]
[New Thread 0x7ffe474f8700 (LWP 468903)]
[New Thread 0x7ffe474b6700 (LWP 468905)]
[Switching to Thread 0x7ffe4757c700 (LWP 468673)]

Thread 31 "mysqld-debug" hit Breakpoint 2, is_parameters_syntax_correct (interface_params=...)
    at /var/lib/pb2/sb_1-1352104-1607570308.55/rpm/BUILD/mysql-5.7.33/mysql-5.7.33/rapid/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_utils.cc:1112
warning: Source file is more recent than executable.
1112	{
(gdb) n
1113	  enum_gcs_error error= GCS_OK;
(gdb) n
1117	    interface_params.get_parameter("group_name");
(gdb) n
1119	    interface_params.get_parameter("local_node");
(gdb) n
1121	    interface_params.get_parameter("peer_nodes");
(gdb) n
1123	    interface_params.get_parameter("bootstrap_group");
(gdb) n
1125	    interface_params.get_parameter("poll_spin_loops");
(gdb) n
1127	    interface_params.get_parameter("compression_threshold");
(gdb) n
1129	    interface_params.get_parameter("compression");
(gdb) n
1131	    interface_params.get_parameter("wait_time");
(gdb) n
1133	    interface_params.get_parameter("join_attempts");
(gdb) n
1135	    interface_params.get_parameter("join_sleep_time");
(gdb) n
1143	  if (group_name_str != NULL &&
(gdb) n
1144	      group_name_str->size() == 0)
(gdb) n
1143	  if (group_name_str != NULL &&
(gdb) n
1154	  if (bootstrap_group_str != NULL)
(gdb) n
1156	    std::string &flag= const_cast<std::string &>(*bootstrap_group_str);
(gdb) n
1157	    error= is_valid_flag("bootstrap_group", flag);
(gdb) n
1158	    if (error == GCS_NOK)
(gdb) n
1163	  if (peer_nodes_str != NULL)
(gdb) n
1168	    std::vector<std::string> hostnames_and_ports;
(gdb) n
1169	    std::vector<std::string> invalid_hostnames_and_ports;
(gdb) n
1170	    Gcs_xcom_utils::process_peer_nodes(peer_nodes_str, hostnames_and_ports);
(gdb) n
1172	                                        invalid_hostnames_and_ports);
(gdb) n
1174	    if(!invalid_hostnames_and_ports.empty())
(gdb) n
1191	    if(!invalid_hostnames_and_ports.empty() && hostnames_and_ports.empty())
(gdb) n
1169	    std::vector<std::string> invalid_hostnames_and_ports;
(gdb) n
1196	    }
(gdb) n
1200	  if (local_node_str != NULL)
(gdb) n
1202	    bool matches_local_ip= false;
(gdb) n
1203	    std::map<std::string, int> ips;
(gdb) n
1204	    std::map<std::string, int>::iterator it;
(gdb) n
1206	    std::string::size_type delim_pos= (*local_node_str).find_last_of(":");
(gdb) n
1207	    std::string host= (*local_node_str).substr(0, delim_pos);
(gdb) n
1208	    std::string ip;
(gdb) n
1211	    if (!is_valid_hostname(*local_node_str))
(gdb) n
1222	    if (resolve_ip_addr_from_hostname(host, ip))
(gdb) n
1230	    if (ip.compare(host) != 0)
(gdb) n
1234	    if (get_ipv4_local_addresses(ips, true))
(gdb) n
1243	    for (it= ips.begin(); it != ips.end() && !matches_local_ip; it++)
(gdb) n
1244	      matches_local_ip= (*it).first.compare(ip) == 0;
(gdb) n
1243	    for (it= ips.begin(); it != ips.end() && !matches_local_ip; it++)
(gdb) n
1244	      matches_local_ip= (*it).first.compare(ip) == 0;
(gdb) n
1243	    for (it= ips.begin(); it != ips.end() && !matches_local_ip; it++)
(gdb) n
1244	      matches_local_ip= (*it).first.compare(ip) == 0;
(gdb) n
1243	    for (it= ips.begin(); it != ips.end() && !matches_local_ip; it++)
(gdb) n
1244	      matches_local_ip= (*it).first.compare(ip) == 0;
(gdb) n
1243	    for (it= ips.begin(); it != ips.end() && !matches_local_ip; it++)
(gdb) n
1245	    if(!matches_local_ip)
(gdb) n
1208	    std::string ip;
(gdb) n
1207	    std::string host= (*local_node_str).substr(0, delim_pos);
(gdb) n
*** Error in `/usr/sbin/mysqld-debug': free(): invalid next size (fast): 0x00007ffdf0011450 ***
[29 Aug 2022 3:34] Keso BIBO
Is there a way to fix this bug?
As far as I know, this error is only encountered on some models of Alibaba Cloud

Looking forward to your reply