Bug #101110 MGR is not hung when active members less then half
Submitted: 10 Oct 2020 2:26 Modified: 23 Oct 2020 2:18
Reporter: Richard Hao Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S1 (Critical)
Version:8.0.21 OS:CentOS (7.5.1804)
Assigned to: MySQL Verification Team CPU Architecture:x86 (Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz)

[10 Oct 2020 2:26] Richard Hao
Description:
we have a mgr cluster with 7 members, then we kill the mysqld process one by one, after 4 members leave the cluster, there are only 3 members active, as expected,the active member counts is less then 7/2+1, the DML on primary node will be hung, but in fact, we can also do insert on the master node.
our GR is working on single-primary mode. my.cnf as follows:

server-id=6
log-bin=CentOS7-Server6-bin
datadir=/data/mysql
socket=/var/lib/mysql1/mysql.sock
#skip-grant-tables
log-error=/data/mysql/mysqld.log
pid-file=/data/mysql/mysqld.pid
binlog_format=row
gtid-mode=ON
enforce-gtid-consistency=ON
relay-log-info-repository=TABLE
master-info-repository=TABLE
binlog-checksum=NONE
log-slave-updates=on
skip-slave-start
performance_schema=ON
lower_case_table_names=1
plugin-load-add=mysql_clone.so
clone=FORCE_PLUS_PERMANENT

transaction-write-set-extraction ='XXHASH64'
loose-group_replication_group_name = 'b13df29e-90b6-11e8-8d1b-525400fc3996'
loose-group_replication_start_on_boot = off
loose-group_replication_local_address = 'CentOS7-Server6:33061'
loose-group_replication_group_seeds ='CentOS7-Server6:33061,CentOS7-Server7:33061,CentOS7-Server8:33061,CentOS7-Server5:33061,CentOS7-Server9:33061,CentOS7-Server10:33061,CentOS7-Server11:33061'
loose-group_replication_bootstrap_group = off

other members have different loose-group_replication_local_address settings.

error log is attached

How to repeat:
start the MGR GROUP with 7 members

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+------------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST      | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+------------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 111284db-f88f-11ea-b10d-0050563c33c4 | CentOS7-Server6  |        3306 | ONLINE       | PRIMARY     | 8.0.21         |
| group_replication_applier | 11b72caa-f88f-11ea-9a5c-0050562c2906 | CentOS7-Server8  |        3306 | ONLINE       | SECONDARY   | 8.0.21         |
| group_replication_applier | 1219c539-f88f-11ea-b979-00505624863e | CentOS7-Server7  |        3306 | ONLINE       | SECONDARY   | 8.0.21         |
| group_replication_applier | 22222caa-f88f-11ea-9a5c-0050562c2907 | CentOS7-Server5  |        3306 | ONLINE       | SECONDARY   | 8.0.21         |
| group_replication_applier | 62f8518b-023a-11eb-9228-000c29bbdab1 | CentOS7-Server11 |        3306 | ONLINE       | SECONDARY   | 8.0.21         |
| group_replication_applier | 62f8518b-023a-11eb-9228-000c29bbdbd8 | CentOS7-Server9  |        3306 | ONLINE       | SECONDARY   | 8.0.21         |
| group_replication_applier | 62f8518b-023a-11eb-9228-000c29bbdbe9 | CentOS7-Server10 |        3306 | ONLINE       | SECONDARY   | 8.0.21         |
+---------------------------+--------------------------------------+------------------+-------------+--------------+-------------+----------------+
7 rows in set (0.00 sec)

kill the mysqld process one by one, till 4 members are killed, only 3 members are active

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST     | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 111284db-f88f-11ea-b10d-0050563c33c4 | CentOS7-Server6 |        3306 | ONLINE       | PRIMARY     | 8.0.21         |
| group_replication_applier | 11b72caa-f88f-11ea-9a5c-0050562c2906 | CentOS7-Server8 |        3306 | ONLINE       | SECONDARY   | 8.0.21         |
| group_replication_applier | 62f8518b-023a-11eb-9228-000c29bbdbd8 | CentOS7-Server9 |        3306 | ONLINE       | SECONDARY   | 8.0.21         |
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)
[10 Oct 2020 2:29] Richard Hao
add attch files
[10 Oct 2020 2:31] Richard Hao
error log of the primary node

Attachment: error_log.txt (text/plain), 7.06 KiB.

[10 Oct 2020 2:32] Richard Hao
add error log
[14 Oct 2020 16:49] MySQL Verification Team
Hi,

I'm sorry I don't understand this:

> the DML on primary node will be hung, but in fact, we can also do insert on the master node.

 mysql> select * from replication_group_members;
 +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST     | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 111284db-f88f-11ea-b10d-0050563c33c4 | CentOS7-Server6 |        3306 | ONLINE       | PRIMARY     | 8.0.21         |
| group_replication_applier | 11b72caa-f88f-11ea-9a5c-0050562c2906 | CentOS7-Server8 |        3306 | ONLINE       | SECONDARY   | 8.0.21         |
| group_replication_applier | 62f8518b-023a-11eb-9228-000c29bbdbd8 | CentOS7-Server9 |        3306 | ONLINE       | SECONDARY   | 8.0.21         |
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+

What is in this case MASTER and what PRIMARY node?
[15 Oct 2020 2:36] Richard Hao
> the DML on primary node will be hung, but in fact, we can also do insert on the master node.

 mysql> select * from replication_group_members;
 +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST     | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 111284db-f88f-11ea-b10d-0050563c33c4 | CentOS7-Server6 |        3306 | ONLINE       | PRIMARY     | 8.0.21         |
| group_replication_applier | 11b72caa-f88f-11ea-9a5c-0050562c2906 | CentOS7-Server8 |        3306 | ONLINE       | SECONDARY   | 8.0.21         |
| group_replication_applier | 62f8518b-023a-11eb-9228-000c29bbdbd8 | CentOS7-Server9 |        3306 | ONLINE       | SECONDARY   | 8.0.21         |
+---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+

>What is in this case MASTER and what PRIMARY node?

sorry for  inaccurate description, the master node  is the primary node, in our GR cluster, it is CentOS7-Server6
[15 Oct 2020 13:46] MySQL Verification Team
Hi Richard,

but if master is primary then what do you mean by:

> the DML on primary node will be hung, but in fact, we can also do insert on the master nod

If master == primary, dml is both hanging and working on same node, I'm confused.

thanks
Bogdan
[15 Oct 2020 14:17] Richard Hao
Hi,Bogdan

> the DML on primary node will be hung, but in fact, we can also do insert on the master nod

When 4 secondary members leave a 7-members GR group, only 3 active members in the GR group, less than half, so the DML on the primary node should have been blocked.

thanks
Richard
[21 Oct 2020 15:56] MySQL Verification Team
Hi,

I still don't get it.

You san that "the DML on primary node will be hung, but in fact, we can also do insert on the master node"

and that "the master node  is the primary node" (this is what I understood originally) 

These two combined state that on same node (master/primary) dml (insert) will be hung and also you can do insert ?!

Are you trying to say that "some" dml does not work and "some" dml (insert) work, or you expect that node to not have available DML (readonly) and the dml's work there or ? I really need this clarified.

The test I made, 7 node, killed 4, 3 remaining are read only

thanks
Bogdan
[21 Oct 2020 16:19] MySQL Verification Team
Note, I'm testing with latest .22 

Can you reproduce this with 8.0.22 ?

As I said, I run 7 node GR, I poweroff 4 nodes, the remaining 3 are readonly.

all best
Bogdan
[21 Oct 2020 16:38] MySQL Verification Team
Hi,

One update to clarify things.

You have 7 node cluster. If your 4 nodes die "in the same time" (I'm testing by powering them off in the same time) then the remaining 3 nodes must be read-only (and I can confirm this with 8.0.22, testing for days, cannot reproduce that one node is not read-only).

On the other hand, if your 4 nodes die "one by one", then with every node that dies the membership is adjusted and consequently the number representing majority, so in that case your master will not be read only.

does this explain your situation?

all best
Bogdan
[23 Oct 2020 2:13] Richard Hao
Yes, that solves my problem, Thanks Bogdan. This bug can be closed.