| Bug #97266 | GR fail to start,conflict with k8s CNI(flannel) | ||
|---|---|---|---|
| Submitted: | 17 Oct 2019 2:39 | Modified: | 25 Oct 2019 9:23 |
| Reporter: | weston lee | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Group Replication | Severity: | S1 (Critical) |
| Version: | 5.7.28 | OS: | CentOS (aliyun ECS CentOS 7.4) |
| Assigned to: | CPU Architecture: | x86 | |
[25 Oct 2019 9:23]
MySQL Verification Team
Hi, Thanks for the report. Verified. kind regards
[12 Dec 2019 12:31]
Tiago Jorge
Thank you for your bug report. Can you please provide the configuration (JSON or any other) of your overlay flannel network so that we can try and reproduce the problem in our environment?
[13 Sep 2021 10:19]
oracle wang
I've encountered a similar bug, mysql gr works fine when stop and reset flannel.
Attachment: info.txt (text/plain), 47.49 KiB.
[13 Sep 2021 10:19]
oracle wang
I've encountered a similar bug, mysql gr works fine when stop and reset flannel.
Attachment: info.txt (text/plain), 47.49 KiB.
[15 Sep 2021 9:08]
oracle wang
Starting program: /usr/sbin/mysqld-debug --defaults-file=/data/mysql-3306/my.cnf --pid-file=/var/run/mysqld/mysqld-3306.pid --gdb
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffed75b700 (LWP 467930)]
[New Thread 0x7ffe43c49700 (LWP 468336)]
[New Thread 0x7ffe43448700 (LWP 468337)]
[New Thread 0x7ffe42c47700 (LWP 468338)]
[New Thread 0x7ffe42446700 (LWP 468339)]
[New Thread 0x7ffe41c45700 (LWP 468340)]
[New Thread 0x7ffe41444700 (LWP 468341)]
[New Thread 0x7ffe40c43700 (LWP 468342)]
[New Thread 0x7ffe40442700 (LWP 468343)]
[New Thread 0x7ffe3fc41700 (LWP 468344)]
[New Thread 0x7ffe3f440700 (LWP 468345)]
[New Thread 0x7ffe3ec3f700 (LWP 468346)]
[New Thread 0x7ffe37fff700 (LWP 468347)]
[New Thread 0x7ffe3e43e700 (LWP 468348)]
[New Thread 0x7ffe3dc3d700 (LWP 468349)]
[New Thread 0x7ffe3d216700 (LWP 468370)]
[Thread 0x7ffe3d216700 (LWP 468370) exited]
[New Thread 0x7ffe3d216700 (LWP 468638)]
[Thread 0x7ffe3d216700 (LWP 468638) exited]
[New Thread 0x7ffe3d216700 (LWP 468639)]
[New Thread 0x7ffe4d449700 (LWP 468640)]
[New Thread 0x7ffe4cc48700 (LWP 468641)]
[New Thread 0x7ffe4c447700 (LWP 468642)]
[New Thread 0x7ffe4bc46700 (LWP 468643)]
[New Thread 0x7ffe4b445700 (LWP 468644)]
[New Thread 0x7ffe4ac44700 (LWP 468645)]
[New Thread 0x7ffe4a443700 (LWP 468646)]
[New Thread 0x7ffe49c42700 (LWP 468647)]
[New Thread 0x7ffe49441700 (LWP 468648)]
[New Thread 0x7ffe48c40700 (LWP 468649)]
[New Thread 0x7ffe4843f700 (LWP 468650)]
[New Thread 0x7ffe4757c700 (LWP 468673)]
[New Thread 0x7ffe4753a700 (LWP 468901)]
[New Thread 0x7ffe3ca15700 (LWP 468902)]
[New Thread 0x7ffe474f8700 (LWP 468903)]
[New Thread 0x7ffe474b6700 (LWP 468905)]
[Switching to Thread 0x7ffe4757c700 (LWP 468673)]
Thread 31 "mysqld-debug" hit Breakpoint 2, is_parameters_syntax_correct (interface_params=...)
at /var/lib/pb2/sb_1-1352104-1607570308.55/rpm/BUILD/mysql-5.7.33/mysql-5.7.33/rapid/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_utils.cc:1112
warning: Source file is more recent than executable.
1112 {
(gdb) n
1113 enum_gcs_error error= GCS_OK;
(gdb) n
1117 interface_params.get_parameter("group_name");
(gdb) n
1119 interface_params.get_parameter("local_node");
(gdb) n
1121 interface_params.get_parameter("peer_nodes");
(gdb) n
1123 interface_params.get_parameter("bootstrap_group");
(gdb) n
1125 interface_params.get_parameter("poll_spin_loops");
(gdb) n
1127 interface_params.get_parameter("compression_threshold");
(gdb) n
1129 interface_params.get_parameter("compression");
(gdb) n
1131 interface_params.get_parameter("wait_time");
(gdb) n
1133 interface_params.get_parameter("join_attempts");
(gdb) n
1135 interface_params.get_parameter("join_sleep_time");
(gdb) n
1143 if (group_name_str != NULL &&
(gdb) n
1144 group_name_str->size() == 0)
(gdb) n
1143 if (group_name_str != NULL &&
(gdb) n
1154 if (bootstrap_group_str != NULL)
(gdb) n
1156 std::string &flag= const_cast<std::string &>(*bootstrap_group_str);
(gdb) n
1157 error= is_valid_flag("bootstrap_group", flag);
(gdb) n
1158 if (error == GCS_NOK)
(gdb) n
1163 if (peer_nodes_str != NULL)
(gdb) n
1168 std::vector<std::string> hostnames_and_ports;
(gdb) n
1169 std::vector<std::string> invalid_hostnames_and_ports;
(gdb) n
1170 Gcs_xcom_utils::process_peer_nodes(peer_nodes_str, hostnames_and_ports);
(gdb) n
1172 invalid_hostnames_and_ports);
(gdb) n
1174 if(!invalid_hostnames_and_ports.empty())
(gdb) n
1191 if(!invalid_hostnames_and_ports.empty() && hostnames_and_ports.empty())
(gdb) n
1169 std::vector<std::string> invalid_hostnames_and_ports;
(gdb) n
1196 }
(gdb) n
1200 if (local_node_str != NULL)
(gdb) n
1202 bool matches_local_ip= false;
(gdb) n
1203 std::map<std::string, int> ips;
(gdb) n
1204 std::map<std::string, int>::iterator it;
(gdb) n
1206 std::string::size_type delim_pos= (*local_node_str).find_last_of(":");
(gdb) n
1207 std::string host= (*local_node_str).substr(0, delim_pos);
(gdb) n
1208 std::string ip;
(gdb) n
1211 if (!is_valid_hostname(*local_node_str))
(gdb) n
1222 if (resolve_ip_addr_from_hostname(host, ip))
(gdb) n
1230 if (ip.compare(host) != 0)
(gdb) n
1234 if (get_ipv4_local_addresses(ips, true))
(gdb) n
1243 for (it= ips.begin(); it != ips.end() && !matches_local_ip; it++)
(gdb) n
1244 matches_local_ip= (*it).first.compare(ip) == 0;
(gdb) n
1243 for (it= ips.begin(); it != ips.end() && !matches_local_ip; it++)
(gdb) n
1244 matches_local_ip= (*it).first.compare(ip) == 0;
(gdb) n
1243 for (it= ips.begin(); it != ips.end() && !matches_local_ip; it++)
(gdb) n
1244 matches_local_ip= (*it).first.compare(ip) == 0;
(gdb) n
1243 for (it= ips.begin(); it != ips.end() && !matches_local_ip; it++)
(gdb) n
1244 matches_local_ip= (*it).first.compare(ip) == 0;
(gdb) n
1243 for (it= ips.begin(); it != ips.end() && !matches_local_ip; it++)
(gdb) n
1245 if(!matches_local_ip)
(gdb) n
1208 std::string ip;
(gdb) n
1207 std::string host= (*local_node_str).substr(0, delim_pos);
(gdb) n
*** Error in `/usr/sbin/mysqld-debug': free(): invalid next size (fast): 0x00007ffdf0011450 ***
[29 Aug 2022 3:34]
Keso BIBO
Is there a way to fix this bug? As far as I know, this error is only encountered on some models of Alibaba Cloud Looking forward to your reply

Description: ifconfig -------------------------------------------------------------------------------- cni0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 10.222.30.1 netmask 255.255.255.0 broadcast 0.0.0.0 ether 5e:31:9d:66:bd:f8 txqueuelen 1000 (Ethernet) RX packets 6776 bytes 12565625 (11.9 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 7206 bytes 4767256 (4.5 MiB) TX errors 0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.24.0.61 netmask 255.255.255.0 broadcast 172.24.0.255 ether 00:16:3e:11:9c:10 txqueuelen 1000 (Ethernet) RX packets 362096646 bytes 482421352506 (449.2 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 67049446 bytes 11734902488 (10.9 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 1000 (Local Loopback) RX packets 1475791 bytes 3737124512 (3.4 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1475791 bytes 3737124512 (3.4 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 -------------------------------------------------------------------------------- MySQL server crashes when starting GR. error log -------------------------------------------------------------------------------- *** Error in `/usr/sbin/mysqld': double free or corruption (!prev): 0x00007f24e451e2f0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x81499)[0x7f29abb1a499] /usr/lib64/mysql/plugin/group_replication.so(_Z24get_ipv4_local_addressesRSt3mapISsiSt4lessISsESaISt4pairIKSsiEEEb+0xb16)[0x7f2565b64fc6] /usr/lib64/mysql/plugin/group_replication.so(_Z32get_ipv4_local_private_addressesRSt3mapISsiSt4lessISsESaISt4pairIKSsiEEEb+0x5d)[0x7f2565b662bd] /usr/lib64/mysql/plugin/group_replication.so(_Z21fix_parameters_syntaxR24Gcs_interface_parameters+0x117b)[0x7f2565b7715b] /usr/lib64/mysql/plugin/group_replication.so(_ZN18Gcs_xcom_interface10initializeERK24Gcs_interface_parameters+0x2dc)[0x7f2565b905cc] /usr/lib64/mysql/plugin/group_replication.so(_ZN14Gcs_operations9configureERK24Gcs_interface_parameters+0x9b)[0x7f2565ba6c3b] /usr/lib64/mysql/plugin/group_replication.so(_Z29configure_group_communicationP23st_server_ssl_variables+0xccb)[0x7f2565bb7e1b] /usr/lib64/mysql/plugin/group_replication.so(_Z26initialize_plugin_and_join25enum_plugin_con_isolationP29Delayed_initialization_thread+0x227)[0x7f2565bb8da7] /usr/lib64/mysql/plugin/group_replication.so(_Z30plugin_group_replication_startv+0x5b1)[0x7f2565bb9501] /usr/sbin/mysqld(_Z23group_replication_startv+0x93)[0xe35b93] /usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x2643)[0xccf753] /usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x3ad)[0xcd39bd] /usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0xa7d)[0xcd451d] /usr/sbin/mysqld(_Z10do_commandP3THD+0x19f)[0xcd5f1f] /usr/sbin/mysqld(handle_connection+0x290)[0xd97dc0] /usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0x127fae4] /lib64/libpthread.so.0(+0x7e25)[0x7f29ad0dde25] /lib64/libc.so.6(clone+0x6d)[0x7f29abb97bad] ======= Memory map: ======== …… 04:38:23 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. Attempting to collect some information that could help diagnose the problem. As this is a crash and something is definitely wrong, the information collection process might fail. key_buffer_size=33554432 read_buffer_size=16777216 max_used_connections=81 max_threads=4000 thread_count=81 connection_count=81 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 131158299 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f24e41248a0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7f2563fcfe30 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xf0768b] /usr/sbin/mysqld(handle_fatal_signal+0x461)[0x7b9311] /lib64/libpthread.so.0(+0xf6d0)[0x7f29ad0e56d0] /lib64/libc.so.6(gsignal+0x37)[0x7f29abacf277] /lib64/libc.so.6(abort+0x148)[0x7f29abad0968] /lib64/libc.so.6(+0x78d37)[0x7f29abb11d37] /lib64/libc.so.6(+0x81499)[0x7f29abb1a499] /usr/lib64/mysql/plugin/group_replication.so(_Z24get_ipv4_local_addressesRSt3mapISsiSt4lessISsESaISt4pairIKSsiEEEb+0xb16)[0x7f2565b64fc6] /usr/lib64/mysql/plugin/group_replication.so(_Z32get_ipv4_local_private_addressesRSt3mapISsiSt4lessISsESaISt4pairIKSsiEEEb+0x5d)[0x7f2565b662bd] /usr/lib64/mysql/plugin/group_replication.so(_Z21fix_parameters_syntaxR24Gcs_interface_parameters+0x117b)[0x7f2565b7715b] /usr/lib64/mysql/plugin/group_replication.so(_ZN18Gcs_xcom_interface10initializeERK24Gcs_interface_parameters+0x2dc)[0x7f2565b905cc] /usr/lib64/mysql/plugin/group_replication.so(_ZN14Gcs_operations9configureERK24Gcs_interface_parameters+0x9b)[0x7f2565ba6c3b] /usr/lib64/mysql/plugin/group_replication.so(_Z29configure_group_communicationP23st_server_ssl_variables+0xccb)[0x7f2565bb7e1b] /usr/lib64/mysql/plugin/group_replication.so(_Z26initialize_plugin_and_join25enum_plugin_con_isolationP29Delayed_initialization_thread+0x227)[0x7f2565bb8da7] /usr/lib64/mysql/plugin/group_replication.so(_Z30plugin_group_replication_startv+0x5b1)[0x7f2565bb9501] /usr/sbin/mysqld(_Z23group_replication_startv+0x93)[0xe35b93] /usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x2643)[0xccf753] /usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x3ad)[0xcd39bd] /usr/sbin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0xa7d)[0xcd451d] /usr/sbin/mysqld(_Z10do_commandP3THD+0x19f)[0xcd5f1f] /usr/sbin/mysqld(handle_connection+0x290)[0xd97dc0] /usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0x127fae4] /lib64/libpthread.so.0(+0x7e25)[0x7f29ad0dde25] /lib64/libc.so.6(clone+0x6d)[0x7f29abb97bad] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7f24e4435b50): start group_replication Connection ID (thread ID): 323 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. -------------------------------------------------------------------------------- Solved the problem when uninstalling k8s and flannel ifconfig -------------------------------------------------------------------------------- eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.24.0.61 netmask 255.255.255.0 broadcast 172.24.0.255 ether 00:16:3e:11:9c:10 txqueuelen 1000 (Ethernet) RX packets 362096646 bytes 482421352506 (449.2 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 67049446 bytes 11734902488 (10.9 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 1000 (Local Loopback) RX packets 1475791 bytes 3737124512 (3.4 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1475791 bytes 3737124512 (3.4 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 -------------------------------------------------------------------------------- How to repeat: 1 Install k8s and flannel on aliyun ECS CentOS 7.4 2 Deploy cni0(Container Network Interface)with k8s and flannel 3 Initialize and start group_replication Suggested fix: The problem may be group_replication.so get_ipv4_local_addresses