Bug #98643 group replication will be block primary node shutdown
Submitted: 18 Feb 2020 8:11 Modified: 12 May 2020 15:07
Reporter: phoenix Zhang (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.18 OS:Any
Assigned to: CPU Architecture:Any
Tags: group_replcation

[18 Feb 2020 8:11] phoenix Zhang
Description:
In an 3 nodes group_replication cluster (single_primary_mode=ON). When primary node shutdown, the group_replication may block.

How to repeat:
First, build an 3 nodes cluster.

In 13000:
mysql> CHANGE MASTER TO MASTER_USER="root", MASTER_PASSWORD="" FOR CHANNEL "group_replication_recovery";
Query OK, 0 rows affected, 1 warning (0.03 sec)

mysql> reset master;
Query OK, 0 rows affected (0.01 sec)

mysql> set global group_replication_bootstrap_group=on;
Query OK, 0 rows affected (0.00 sec)

mysql> start group_replication;
Query OK, 0 rows affected (3.10 sec)

In 13001:
mysql> CHANGE MASTER TO MASTER_USER="root", MASTER_PASSWORD="" FOR CHANNEL "group_replication_recovery";
Query OK, 0 rows affected, 1 warning (0.03 sec)

mysql> reset master;
Query OK, 0 rows affected (0.01 sec)

mysql> start group_replication;                                                                                                                                                                             Query OK, 0 rows affected (3.27 sec)

In 13002:
mysql>  CHANGE MASTER TO MASTER_USER="root", MASTER_PASSWORD="" FOR CHANNEL "group_replication_recovery";
Query OK, 0 rows affected, 1 warning (0.03 sec)

mysql> reset master;
Query OK, 0 rows affected (0.02 sec)

mysql> start group_replication;                                                                                                                                                                             Query OK, 0 rows affected (4.43 sec)

Now, check the cluster build success.
mysql> SELECT * FROM `performance_schema`.`replication_group_members`;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 81c3ae0d-5222-11ea-9ecb-ec5c6826bca3 | 127.0.0.1   |       13000 | ONLINE       | PRIMARY     | 8.0.18         |
| group_replication_applier | 82768add-5222-11ea-bd8b-ec5c6826bca3 | 127.0.0.1   |       13002 | ONLINE       | SECONDARY   | 8.0.18         |
| group_replication_applier | 827fbac2-5222-11ea-b1d0-ec5c6826bca3 | 127.0.0.1   |       13001 | ONLINE       | SECONDARY   | 8.0.18         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)

Then, connect the primary node (13000), and create table:
mysql> create table t1 (c1 int primary key, c2 int);                                                                                                                                                        Query OK, 0 rows affected (0.09 sec)

Then, use 3 terminal to execute script for all the 3 nodes:
terminal_1: 
:~/test$ cat a.sh
#!/bin/sh
while true
do
  mysql -uroot -P13000 -h127.0.0.1 -e 'SELECT * FROM test.t1'
done
:~/test$ bash a.sh

terminal_2: 
:~/test$ cat b.sh
#!/bin/sh
while true
do
  mysql -uroot -P13001 -h127.0.0.1 -e 'SELECT * FROM test.t1'
done
:~/test$ bash b.sh

terminal_3: 
:~/test$ cat c.sh
#!/bin/sh
while true
do
  mysql -uroot -P13001 -h127.0.0.1 -e 'SELECT * FROM test.t1'
done
:~/test$ bash c.sh

Then, shutdown the primary node (13000), here, 13001 maybe become the new primary node.
mysql> SELECT * FROM `performance_schema`.`replication_group_members`;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 379e77fb-5223-11ea-a665-ec5c6826bca3 | 127.0.0.1   |       13001 | ONLINE       | PRIMARY     | 8.0.18         |
| group_replication_applier | 37a40c84-5223-11ea-bd67-ec5c6826bca3 | 127.0.0.1   |       13002 | ONLINE       | SECONDARY   | 8.0.18         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)

While check the processlist, an hook exist.
mysql> show processlist;                                                                                                                                                                                    +------+-----------------+-----------------+------+---------+------+--------------------------------------------------------+----------------------------------+-----------+---------------+
| Id   | User            | Host            | db   | Command | Time | State                                                  | Info                             | Rows_sent | Rows_examined |
+------+-----------------+-----------------+------+---------+------+--------------------------------------------------------+----------------------------------+-----------+---------------+
|    5 | event_scheduler | localhost       | NULL | Daemon  |  366 | Waiting on empty queue                                 | NULL                             |         0 |             0 |
|   17 | system user     |                 | NULL | Connect |  138 | waiting for handler commit                             | Group replication applier module |         0 |             0 |
|   20 | system user     |                 | NULL | Query   |  100 | Slave has read all relay log; waiting for more updates | NULL                             |         0 |             0 |
| 5678 | root            | localhost:48682 | NULL | Query   |   48 | Executing hook on transaction begin.                   | SELECT * FROM test.t1            |         0 |             0 |
| 5679 | root            | localhost:59470 | test | Query   |    0 | starting                                               | show processlist                 |         0 |             0 |
| 5680 | root            | localhost:51724 | test | Sleep   |   15 |                                                        | NULL                             |         2 |             2 |
+------+-----------------+-----------------+------+---------+------+--------------------------------------------------------+----------------------------------+-----------+---------------+
6 rows in set (0.00 sec)

Then, do insert in new primary node (13001), it will not return 
mysql> insert into t1 values (1,1),(2,2);

The processlist is:

mysql> show processlist;
+------+-----------------+-----------------+------+---------+------+--------------------------------------------------------+-----------------------------------+-----------+---------------+
| Id   | User            | Host            | db   | Command | Time | State                                                  | Info                              | Rows_sent | Rows_examined |
+------+-----------------+-----------------+------+---------+------+--------------------------------------------------------+-----------------------------------+-----------+---------------+
|    5 | event_scheduler | localhost       | NULL | Daemon  |  383 | Waiting on empty queue                                 | NULL                              |         0 |             0 |
|   17 | system user     |                 | NULL | Connect |  155 | waiting for handler commit                             | Group replication applier module  |         0 |             0 |
|   20 | system user     |                 | NULL | Query   |  117 | Slave has read all relay log; waiting for more updates | NULL                              |         0 |             0 |
| 5678 | root            | localhost:48682 | NULL | Query   |   65 | Executing hook on transaction begin.                   | SELECT * FROM test.t1             |         0 |             0 |
| 5679 | root            | localhost:59470 | test | Query   |    0 | starting                                               | show processlist                  |         0 |             0 |
| 5680 | root            | localhost:51724 | test | Query   |    3 | Executing hook on transaction begin.                   | insert into t1 values (1,1),(2,2) |         0 |             0 |
+------+-----------------+-----------------+------+---------+------+--------------------------------------------------------+-----------------------------------+-----------+---------------+
6 rows in set (0.00 sec)

Suggested fix:
In single-primary mode, the secondly role can do read-only query. So, it should not block when primary node shutdown.
[2 Mar 2020 0:38] phoenix Zhang
This is the config file of port of 13000

Attachment: my13000.cnf (application/octet-stream, text), 1.04 KiB.

[2 Mar 2020 0:38] phoenix Zhang
This is the config file of port of 13001

Attachment: my13001.cnf (application/octet-stream, text), 1.04 KiB.

[2 Mar 2020 0:39] phoenix Zhang
This is the config file of port of 13002

Attachment: my13002.cnf (application/octet-stream, text), 1.04 KiB.

[12 May 2020 15:07] Margaret Fisher
Posted by developer:
 
Changelog entry added for MySQL 8.0.21:

If a group's consistency level (set by the group_replication_consistency system variable) was set to BEFORE or  BEFORE_AND_AFTER, it was possible for a deadlock to occur in the event of a primary failover. The primary failover is now registered differently to avoid this situation.