Bug #94762 MGR decides which node to expel
Submitted: 25 Mar 2019 4:03 Modified: 27 Mar 2019 7:21
Reporter: lieyia -- Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[25 Mar 2019 4:03] lieyia --
Description:
Hi:
    I have a question about which node to expel when network jitter,and it seems to have relationship with nodeid.
    Is this a bug?

How to repeat:
test 1:
group_replication_unreachable_majority_timeout = 0(a,b,c)
1. group replication with 3 nodes (single primary), a, b, c
2.on node b, execute stop group_replication and start group_replication;
3.on node c, execute stop group_replication and start group_replication;
4.break the network between nodes of(b and c), on node b,iptables -A INPUT -s c  -j DROP;
5. execute "select * from performance_schema.replication_group_members " on a, all nodes are online;
6. execute "select * from performance_schema.replication_group_members " on b, b and a are online, c unreachable; 
7. execute "select * from performance_schema.replication_group_members " on c, c and a are online, b unreachable;
8. no nodes are expelled, and all nodes are online when network recovery;

test 2:
group_replication_unreachable_majority_timeout = 0(a,b,c)
1. group replication with 3 nodes (single primary), a, b, c
2.on node b, execute stop group_replication and start group_replication;
3.on node c, execute stop group_replication and start group_replication;
4.break the network between nodes of(b and a), on node b,iptables -A INPUT -s a  -j DROP;
5. node a will be expeled from the mgr;

I doubt if this phenomenon is normal or  bug?
[25 Mar 2019 16:49] MySQL Verification Team
Hi,

It's not a bug. You can read bit more about this behavior here:
https://mysqlhighavailability.com/group-replication-coping-with-unreliable-failure-detecti...

kind regards
Bogdan
[26 Mar 2019 2:42] lieyia --
hi,
 Ok, thanks very much.
[26 Mar 2019 4:59] lieyia --
hi, Bogdan
   I took a close look at the article of https://mysqlhighavailability.com/group-replication-coping-with-unreliable-failure-detecti.... 
   There is something different. In my case, I just cut off the network between the two nodes of node2 and node3, Instead of node 3 becoming an island.
   First,cut off network of nodeid which mgr internal for marking nodes between 1 and 2;
   Second,cut off network of nodeid between 0 and 1;
   There are different results as description above.
[26 Mar 2019 13:13] MySQL Verification Team
Yes, that url does not explain 1/1 your scenario but gives an overview of how things work in real life scenario.

Cutting network single sided using iptables is not something this system will properly detect and results are unpredictable. If you are running tests, you have to run on 3 real nodes and disconnect them physically from the network, don't iptables them :)

all best
Bogdan
[26 Mar 2019 16:50] MySQL Verification Team
If you can give us the

> 5. execute "select * from performance_schema.replication_group_members " on a, all nodes are online;
> 6. execute "select * from performance_schema.replication_group_members " on b, b and a are online, c unreachable; 
> 7. execute "select * from performance_schema.replication_group_members " on c, c and a are online, b unreachable;

results for the test2 we might be able to explain them if needed

all best
b.
[27 Mar 2019 7:21] lieyia --
Hi,
In my environment, three node mgr(single primary)
node a(10.40.139.6)
node b(10.27.56.32)
node c(10.7.61.9)

on b, stop group_replication, start group_replication;
on c, stop group_replication, start group_replication;
then, nodeid of a in mgr is 0,  b is 1, c is 2

so,  if i cutoff network of b and c(1, 2), there is no node will be expeled.
but, if i cutoff network of a and b(0,1), node a or b will be expeled from mgr after 5s.
also, if i cutoff network of a and c(0,2), node a or c will be expeled from mgr after 5s.

all best.