Description:
In an 3 nodes group_replication cluster (single_primary_mode=ON). When primary node shutdown, the group_replication may block.
How to repeat:
First, build an 3 nodes cluster.
In 13000:
mysql> CHANGE MASTER TO MASTER_USER="root", MASTER_PASSWORD="" FOR CHANNEL "group_replication_recovery";
Query OK, 0 rows affected, 1 warning (0.03 sec)
mysql> reset master;
Query OK, 0 rows affected (0.01 sec)
mysql> set global group_replication_bootstrap_group=on;
Query OK, 0 rows affected (0.00 sec)
mysql> start group_replication;
Query OK, 0 rows affected (3.10 sec)
In 13001:
mysql> CHANGE MASTER TO MASTER_USER="root", MASTER_PASSWORD="" FOR CHANNEL "group_replication_recovery";
Query OK, 0 rows affected, 1 warning (0.03 sec)
mysql> reset master;
Query OK, 0 rows affected (0.01 sec)
mysql> start group_replication; Query OK, 0 rows affected (3.27 sec)
In 13002:
mysql> CHANGE MASTER TO MASTER_USER="root", MASTER_PASSWORD="" FOR CHANNEL "group_replication_recovery";
Query OK, 0 rows affected, 1 warning (0.03 sec)
mysql> reset master;
Query OK, 0 rows affected (0.02 sec)
mysql> start group_replication; Query OK, 0 rows affected (4.43 sec)
Now, check the cluster build success.
mysql> SELECT * FROM `performance_schema`.`replication_group_members`;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 81c3ae0d-5222-11ea-9ecb-ec5c6826bca3 | 127.0.0.1 | 13000 | ONLINE | PRIMARY | 8.0.18 |
| group_replication_applier | 82768add-5222-11ea-bd8b-ec5c6826bca3 | 127.0.0.1 | 13002 | ONLINE | SECONDARY | 8.0.18 |
| group_replication_applier | 827fbac2-5222-11ea-b1d0-ec5c6826bca3 | 127.0.0.1 | 13001 | ONLINE | SECONDARY | 8.0.18 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)
Then, connect the primary node (13000), and create table:
mysql> create table t1 (c1 int primary key, c2 int); Query OK, 0 rows affected (0.09 sec)
Then, use 3 terminal to execute script for all the 3 nodes:
terminal_1:
:~/test$ cat a.sh
#!/bin/sh
while true
do
mysql -uroot -P13000 -h127.0.0.1 -e 'SELECT * FROM test.t1'
done
:~/test$ bash a.sh
terminal_2:
:~/test$ cat b.sh
#!/bin/sh
while true
do
mysql -uroot -P13001 -h127.0.0.1 -e 'SELECT * FROM test.t1'
done
:~/test$ bash b.sh
terminal_3:
:~/test$ cat c.sh
#!/bin/sh
while true
do
mysql -uroot -P13001 -h127.0.0.1 -e 'SELECT * FROM test.t1'
done
:~/test$ bash c.sh
Then, shutdown the primary node (13000), here, 13001 maybe become the new primary node.
mysql> SELECT * FROM `performance_schema`.`replication_group_members`;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 379e77fb-5223-11ea-a665-ec5c6826bca3 | 127.0.0.1 | 13001 | ONLINE | PRIMARY | 8.0.18 |
| group_replication_applier | 37a40c84-5223-11ea-bd67-ec5c6826bca3 | 127.0.0.1 | 13002 | ONLINE | SECONDARY | 8.0.18 |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)
While check the processlist, an hook exist.
mysql> show processlist; +------+-----------------+-----------------+------+---------+------+--------------------------------------------------------+----------------------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+-----------------+-----------------+------+---------+------+--------------------------------------------------------+----------------------------------+-----------+---------------+
| 5 | event_scheduler | localhost | NULL | Daemon | 366 | Waiting on empty queue | NULL | 0 | 0 |
| 17 | system user | | NULL | Connect | 138 | waiting for handler commit | Group replication applier module | 0 | 0 |
| 20 | system user | | NULL | Query | 100 | Slave has read all relay log; waiting for more updates | NULL | 0 | 0 |
| 5678 | root | localhost:48682 | NULL | Query | 48 | Executing hook on transaction begin. | SELECT * FROM test.t1 | 0 | 0 |
| 5679 | root | localhost:59470 | test | Query | 0 | starting | show processlist | 0 | 0 |
| 5680 | root | localhost:51724 | test | Sleep | 15 | | NULL | 2 | 2 |
+------+-----------------+-----------------+------+---------+------+--------------------------------------------------------+----------------------------------+-----------+---------------+
6 rows in set (0.00 sec)
Then, do insert in new primary node (13001), it will not return
mysql> insert into t1 values (1,1),(2,2);
The processlist is:
mysql> show processlist;
+------+-----------------+-----------------+------+---------+------+--------------------------------------------------------+-----------------------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+------+-----------------+-----------------+------+---------+------+--------------------------------------------------------+-----------------------------------+-----------+---------------+
| 5 | event_scheduler | localhost | NULL | Daemon | 383 | Waiting on empty queue | NULL | 0 | 0 |
| 17 | system user | | NULL | Connect | 155 | waiting for handler commit | Group replication applier module | 0 | 0 |
| 20 | system user | | NULL | Query | 117 | Slave has read all relay log; waiting for more updates | NULL | 0 | 0 |
| 5678 | root | localhost:48682 | NULL | Query | 65 | Executing hook on transaction begin. | SELECT * FROM test.t1 | 0 | 0 |
| 5679 | root | localhost:59470 | test | Query | 0 | starting | show processlist | 0 | 0 |
| 5680 | root | localhost:51724 | test | Query | 3 | Executing hook on transaction begin. | insert into t1 values (1,1),(2,2) | 0 | 0 |
+------+-----------------+-----------------+------+---------+------+--------------------------------------------------------+-----------------------------------+-----------+---------------+
6 rows in set (0.00 sec)
Suggested fix:
In single-primary mode, the secondly role can do read-only query. So, it should not block when primary node shutdown.