Bug #97987 split brain:1 node isolated (SECONDARY-ONLINE) but unreachable with RO router
Submitted: 14 Dec 2019 12:20 Modified: 18 Dec 2019 13:46
Reporter: lionel mazeyrat Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Router Severity:S2 (Serious)
Version:8.0.18 OS:Windows
Assigned to: MySQL Verification Team CPU Architecture:x86

[14 Dec 2019 12:20] lionel mazeyrat
Description:
mysql innodb cluster with 3 nodes.

I loose 2 nodes (I disconnect them from the network)

from the remaining node TVG-GTC01 with 127.0.0.1:3306:
SELECT MEMBER_HOST,MEMBER_ROLE,MEMBER_STATE,MEMBER_VERSION FROM performance_schema.replication_group_members
TVG-GTC02 PRIMARY UNREACHABLE 8.0.18
TVG-GTC01 SECONDARY ONLINE 8.0.18
TVG-GTCHIS SECONDARY UNREACHABLE 8.0.18

but I can't connect to 127.0.0.1:6647 with the router whereas the database is available for read only :

[routing:gtcCluster_default_rw]
bind_address=0.0.0.0
bind_port=6446
destinations=metadata-cache://gtcCluster/default?role=PRIMARY
routing_strategy=first-available
protocol=classic

routing_strategy=first-available
protocol=classic
[routing:gtcCluster_default_ro]
bind_address=0.0.0.0
bind_port=6447
destinations=metadata-cache://gtcCluster/default?role=SECONDARY
routing_strategy=round-robin-with-fallback
protocol=classic

How to repeat:
split brain 2/1
1 node left
[14 Dec 2019 12:40] lionel mazeyrat
I forgot to mention :
group_replication_exit_state_action = READ_ONLY
[18 Dec 2019 12:05] Frederic Descamps
Hi Lionel, 

First, I would like to correct the term "split-brain", the situation you are describing is a network partition. In fact, the remaining node is in a minority partition as it doesn't reach quorum (1 of 3).

So now that, this is clear, MySQL Router will NEVER allow to use a member/node being in a minority partition.

So this is not a bug but how it MUST work.

Regards,
[18 Dec 2019 12:17] Kenny Gryp
* The subject mentions 'Split Brain'. However, from what I read in the bug description, there is no split brain at all, there is just a network partition where 1 member cannot see the other members. Please refer to that as a network partition. A split brain refers to having 2 partitions accepting writes, resulting in inconsistent datasets
* MySQL Router removes all connections to a member that is network partitioned and not part of the minority group. That is the only possible behavior at this moment. `group_replication_exit_state_action` has no impact on this.

Also, for network partition handling with full automatic rejoin, as best practice, I suggest changing only these settings:
```
group_replication_aurorejoin_tries=3
group_replication_member_expel_timeout=5
```

Please look at https://www.slideshare.net/Grypyrg/mysql-innodb-cluster-new-features-in-80-releases-best-p... 
especially the Network Partition Handling chapter from slide 46 to 52, it explains how this all works and what the best practice is.