Bug #86392 getCluster()status() reports incorrect number of failures allowed
Submitted: 20 May 2017 15:21 Modified: 21 May 2017 1:44
Reporter: Giuseppe Maxia (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Shell AdminAPI InnoDB Cluster / ReplicaSet Severity:S2 (Serious)
Version:5.7 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[20 May 2017 15:21] Giuseppe Maxia
Description:
With a 3 node cluster, when asking for the status, we get "Cluster is ONLINE and can tolerate up to ONE failure."
From this we can assume that an operational cluster is at least two nodes, and a cluster with at least three nodes can tolerate failures.

So, I try with 4 nodes:
{
    "clusterName": "testcluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysqlgr1:3306",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "mysqlgr1:3306": {
                "address": "mysqlgr1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr2:3306": {
                "address": "mysqlgr2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr3:3306": {
                "address": "mysqlgr3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr4:3306": {
                "address": "mysqlgr4:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    }
}

It says the same as for 3 nodes. But is I kill the primary node, wait for a new one to be appointed, and then kill that one, I can see that there the cluster could withstand TWO failures.

Similarly, for five nodes, It says that it can tolerate 2 failures, but I am able to kill the primary three times, and operations for R/W and R/O through the router keep working

{
    "clusterName": "testcluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysqlgr1:3306",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to 2 failures.",
        "topology": {
            "mysqlgr1:3306": {
                "address": "mysqlgr1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr2:3306": {
                "address": "mysqlgr2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr3:3306": {
                "address": "mysqlgr3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr4:3306": {
                "address": "mysqlgr4:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr5:3306": {
                "address": "mysqlgr5:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    }
}

How to repeat:
Create a three node cluster and a four node cluster.
Run getCluster().status()
[21 May 2017 1:44] MySQL Verification Team
Hi Giuseppe,

When you get the "Can tolerate two .." it means it can survive "any 2" nodes to crash. The point that it survived particular 3 does not mean it can survive "any 3" node crashes.

The main issue is lack of arbitration and split brain scenario where with InnoDB cluster (or better - group replication), we don't have arbitrator (like we do with ndb cluster) so when even number of nodes runs we can have split brain scenario. 

all best
Bogdan Kecman