Bug #90457 mysqld crash with ctrl-c/z'ed `START GROUP_REPLICATION`
Submitted: 16 Apr 2018 22:28 Modified: 9 Aug 2018 14:57
Reporter: Kenny Gryp Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:8.0.4 OS:Any
Assigned to: CPU Architecture:Any

[16 Apr 2018 22:28] Kenny Gryp
Description:
You can get mysql to crash by running START & STOP GROUP_REPLICATION concurrently (sortof :))

How to repeat:

0. setup a 3 node cluster
1. `stop group_replication;`, immediately go to 2, don't wait
2. while 1 is running, quickly run `start group_replication;`
3. ctrl-c it immediately, then ctrl-z the shell
4. watch it crash

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | d84addc8-3f73-11e8-b28d-08002789cd2e | node1       |        3306 | ONLINE       | SECONDARY   | 8.0.4          |
| group_replication_applier | d8a4abfc-3f73-11e8-b4fe-08002789cd2e | node2       |        3306 | ONLINE       | PRIMARY     | 8.0.4          |
| group_replication_applier | d8fe8a34-3f73-11e8-b753-08002789cd2e | node3       |        3306 | ONLINE       | SECONDARY   | 8.0.4          |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)

mysql> ^DBye
[vagrant@node1 ~]$  mysql -e 'stop group_replication;' &  sleep 1; mysql
[5] 22793
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 25
Server version: 8.0.4-rc-log MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> start group_replication;
^C^C -- query aborted
2018-04-16T22:25:25.034111Z 11 [ERROR] [MY-011254] Plugin group_replication reported: '[GCS] Error pushing message into group communication engine.'
^C^C -- query aborted
^C^C -- query aborted
^C^C -- query aborted
^C^C -- query aborted
2018-04-16T22:25:26.034653Z 11 [ERROR] [MY-011254] Plugin group_replication reported: '[GCS] Error pushing message into group communication engine.'
2018-04-16T22:25:26.034726Z 11 [ERROR] [MY-011254] Plugin group_replication reported: '[GCS] Error pushing message into group communication engine.'
^C^C -- query aborted
^C^C -- query aborted
^Z
[6]+  Stopped                 mysql
[vagrant@node1 ~]$ 
[vagrant@node1 ~]$ 
2018-04-16T22:25:27.034830Z 11 [ERROR] [MY-011254] Plugin group_replication reported: '[GCS] Error pushing message into group communication engine.'
2018-04-16T22:25:33.999739Z 25 [Warning] [MY-011254] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
2018-04-16T22:25:34.002031Z 25 [Warning] [MY-011254] Plugin group_replication reported: 'Unblocking the group replication thread waiting for applier to start, as the start group replication was killed.'
2018-04-16T22:25:34.002060Z 25 [ERROR] [MY-011254] Plugin group_replication reported: 'Unable to initialize the Group Replication applier module.'
22:25:34 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=3
max_threads=151
thread_count=3
connection_count=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68136 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fa9ac017150
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fa9e929ec70 thread_stack 0x46000
/usr/sbin/mysqld(my_print_stacktrace+0x3d) [0x1904fcd]
/usr/sbin/mysqld(handle_fatal_signal+0x4d1) [0xe443c1]
/lib64/libpthread.so.0(+0xf5e0) [0x7fa9f8ecc5e0]
/usr/sbin/mysqld(thd_get_ha_data+0xa) [0xc01fda]
/usr/sbin/mysqld() [0xf1e613]
/usr/sbin/mysqld(plugin_foreach_with_mask(THD*, bool (**)(THD*, st_plugin_int*, void*), int, unsigned int, void*)+0x199) [0xb9f529]
/usr/sbin/mysqld(plugin_foreach_with_mask(THD*, bool (*)(THD*, st_plugin_int*, void*), int, unsigned int, void*)+0x2c) [0xb9f6ec]
/usr/sbin/mysqld(THD::awake(THD::killed_state)+0x228) [0xb385c8]
/usr/lib64/mysql/plugin/group_replication.so(Applier_module::terminate_applier_thread()+0x133) [0x7fa9cba48113]
/usr/lib64/mysql/plugin/group_replication.so(configure_and_start_applier_module()+0x10c) [0x7fa9cba6c03c]
/usr/lib64/mysql/plugin/group_replication.so(initialize_plugin_and_join(enum_plugin_con_isolation, Delayed_initialization_thread*)+0x4fa) [0x7fa9cba7385a]
/usr/lib64/mysql/plugin/group_replication.so(plugin_group_replication_start(char**)+0x6bf) [0x7fa9cba73fef]
/usr/sbin/mysqld(group_replication_start(char**)+0x9c) [0x106dacc]
/usr/sbin/mysqld(mysql_execute_command(THD*, bool)+0x2360) [0xb835f0]
/usr/sbin/mysqld(mysql_parse(THD*, Parser_state*)+0x37c) [0xb8578c]
/usr/sbin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x1db3) [0xb87753]
/usr/sbin/mysqld(do_command(THD*)+0x1a8) [0xb88358]
/usr/sbin/mysqld() [0xe37378]
/usr/sbin/mysqld() [0x1d5e7ef]
/lib64/libpthread.so.0(+0x7e25) [0x7fa9f8ec4e25]
/lib64/libc.so.6(clone+0x6d) [0x7fa9f72ae34d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fa9ac01f6f0): is an invalid pointer
Connection ID (thread ID): 25
Status: KILL_QUERY

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Suggested fix:
not crash!
[16 Apr 2018 22:33] Kenny Gryp
(sometimes I don't crash it, but I get START GROUP_REPLICATION stuck forever so I have to `kill -9 mysqld`)
[17 Apr 2018 9:25] MySQL Verification Team
Hello Kenny,

Thank you for the report and feedback.
Verified as described with 8.0.4-rc build.

Thanks,
Umesh
[17 Apr 2018 9:26] MySQL Verification Team
test results

Attachment: 90457.results (application/octet-stream, text), 22.94 KiB.

[1 Jun 2018 12:56] Hemant Dangi
Hello Kenny,

There are two parts to the bug:
1. crash with CTRL-C / CTRL-Z and Start GROUP_REPLICATION.
2. You also said 'sometimes I don't crash it, but I get START
GROUP_REPLICATION stuck forever'

I was able to reproduce part 1, but not able to reproduce part 2.
The part 2 where `START GROUP_REPLICATION` is stucked is also important and
need to be resolved.
So can you please provide more details about it:
- how to reproduce it, or
- any trace of where it gets stucked.

Thanks,
Hemant
[3 Jun 2018 6:18] MySQL Verification Team
Changing to "need feedback", waiting on Kenny for (2).
[4 Jul 2018 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[9 Aug 2018 14:57] David Moss
Posted by developer:
 
Thank you for your feedback, this has been fixed in upcoming versions and the following was added to the 8.0.13 changelog:
Issuing START GROUP_REPLICATION and then forcibly stopping the mysqld process, for example using control-C, could result in an unexpected halt of the server.