Bug #89764 With plugin-load 'start/stop slave' ASYNC channel hangs and needs server restart
Submitted: 22 Feb 2018 12:43 Modified: 10 Mar 2018 16:05
Reporter: Narendra Singh Chauhan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:8.0.5 OS:Any
Assigned to: CPU Architecture:Any

[22 Feb 2018 12:43] Narendra Singh Chauhan
Description:
Scenario: When a server is started with plugin-load='group_replication.so' (nothing extra), any ASNYC channel is not able to start or stop (if started with server). This happens because of "group_replication_start_on_boot=ON(default)". Slave's state shown is "Waiting for the next event in relay log".

Observations here:-
1) Post server start, create a ASYNC channel and execute 'start slave'. It hangs.
2) Post server start (is ASYNC channel is already created) execute 'stop slave'. It hangs.
3) When commands are hung, try to kill the processlist connections. That hangs too occasionally.
4) When commands are hung, execute 'SHUTDOWN'. This closes the socket, but, mysqld server still shows running.
5) Even, 'SET group_replication_start_on_boot=OFF' doesn't help here too post server start.
So, in short we need to give 'KILL -9 <processid>' to stop mysqld server.

====================
mysql> show slave status\G
.....
.....
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
.....
      Slave_SQL_Running_State: Waiting for the next event in relay log
.....
                 Channel_Name: ch1
.....

mysql> show processlist;
+----+-----------------+-----------+------+---------+------+-----------------------------------------+------------------+
| Id | User            | Host      | db   | Command | Time | State                                   | Info             |
+----+-----------------+-----------+------+---------+------+-----------------------------------------+------------------+
|  4 | system user     |           | NULL | Query   |   90 | Waiting for the next event in relay log | NULL             |
|  5 | system user     |           | NULL | Connect |   90 | Waiting for master update               | NULL             |
|  6 | event_scheduler | localhost | NULL | Daemon  |   90 | Waiting on empty queue                  | NULL             |
| 10 | root            | localhost | NULL | Query   |    0 | starting                                | show processlist |
+----+-----------------+-----------+------+---------+------+-----------------------------------------+------------------+
4 rows in set (0.00 sec)

mysql> stop slave;  ## HANG HERE

====================

How to repeat:
Steps to repro:-
================

$ mkdir -p mysql-test/var/mysqld.1/data mysql-test/var/log mysql-test/var/tmp/mysqld.1
$ $PWD/bin/mysqld --no-defaults --datadir=$PWD/mysql-test/var/mysqld.1/data --basedir=$PWD --log-error=$PWD/mysql-test/var/log/mysqld.1.err --initialize-insecure --core-file 2>&1 &
$ ./bin/mysqld --defaults-file=./naren_scripts/test_empty.cnf --basedir=$PWD --datadir=$PWD/mysql-test/var/mysqld.1/data --socket=/tmp/mysqld.1.sock --report-host=localhost --log-error=$PWD/mysql-test/var/log/mysqld.1.err --server-id=1 --core-file 2>&1 &

Where,
$ cat ./naren_scripts/test_empty.cnf
[mysqld]
report-host=                127.0.0.1
report-user=                root
log-error-verbosity=        3
plugin-dir='/mysql-8.0/plugin_output_directory/'
plugin-load='group_replication.so'

# Server1
log-bin=                    server1
relay-log=                  server1-relay-log
server-id=                  1
port=                       14000
mysqlx-port=                33060
mysqlx-socket=              /tmp/mysqlx.1.sock

On server:-
$ ./bin/mysql -uroot -S/tmp/mysqld.1.sock
mysql> select * from performance_schema.replication_group_members;
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier |           |             |        NULL | OFFLINE      |             |                |
+---------------------------+-----------+-------------+-------------+--------------+-------------+----------------+
1 row in set (0.04 sec)

mysql> change master to master_host='localhost', master_user='root', master_port=14001 for channel 'ch1';

mysql> start slave;   ## This will hang.

Suggested fix:
Workaround:-
Start server with "--loose-group_replication_start_on_boot=OFF", if just plugin-load='group_replication.so' is set.
[10 Mar 2018 16:05] David Moss
Posted by developer:
 
Thank you for your feedback, this has been fixed in upcoming versions and the following was added to the 8.0.11 changelog:
When MySQL was started with --plugin-load='group_replication.so' but Group Replication was not started, starting an asynchronous slave channel resulted in an unresponsive server.
[31 May 2018 14:57] David Moss
Reclosing.