MySQL Bugs: #86392: getCluster()status() reports incorrect number of failures allowed

Bug #86392	getCluster()status() reports incorrect number of failures allowed
Submitted:	20 May 2017 15:21	Modified:	21 May 2017 1:44
Reporter:	Giuseppe Maxia (OCA)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	Shell AdminAPI InnoDB Cluster / ReplicaSet	Severity:	S2 (Serious)
Version:	5.7	OS:	Any
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
With a 3 node cluster, when asking for the status, we get "Cluster is ONLINE and can tolerate up to ONE failure."
From this we can assume that an operational cluster is at least two nodes, and a cluster with at least three nodes can tolerate failures.

So, I try with 4 nodes:
{
    "clusterName": "testcluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysqlgr1:3306",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "mysqlgr1:3306": {
                "address": "mysqlgr1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr2:3306": {
                "address": "mysqlgr2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr3:3306": {
                "address": "mysqlgr3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr4:3306": {
                "address": "mysqlgr4:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    }
}

It says the same as for 3 nodes. But is I kill the primary node, wait for a new one to be appointed, and then kill that one, I can see that there the cluster could withstand TWO failures.

Similarly, for five nodes, It says that it can tolerate 2 failures, but I am able to kill the primary three times, and operations for R/W and R/O through the router keep working

{
    "clusterName": "testcluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "mysqlgr1:3306",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to 2 failures.",
        "topology": {
            "mysqlgr1:3306": {
                "address": "mysqlgr1:3306",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr2:3306": {
                "address": "mysqlgr2:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr3:3306": {
                "address": "mysqlgr3:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr4:3306": {
                "address": "mysqlgr4:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "mysqlgr5:3306": {
                "address": "mysqlgr5:3306",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    }
}

How to repeat:
Create a three node cluster and a four node cluster.
Run getCluster().status()

Hi Giuseppe,

When you get the "Can tolerate two .." it means it can survive "any 2" nodes to crash. The point that it survived particular 3 does not mean it can survive "any 3" node crashes.

The main issue is lack of arbitration and split brain scenario where with InnoDB cluster (or better - group replication), we don't have arbitrator (like we do with ndb cluster) so when even number of nodes runs we can have split brain scenario. 

all best
Bogdan Kecman