Bug #88846 Node Not Expelled From GR and IC After No Space Left On Device Error
Submitted: 11 Dec 2017 0:34 Modified: 3 Jan 2019 11:47
Reporter: David Tang Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:5.7.20 OS:Any
Assigned to: CPU Architecture:Any

[11 Dec 2017 0:34] David Tang
Description:
The node in the IC and GR is not expelled after the no space error message. 

How to repeat:
1. create three nodes sandbox Innodb cluster
2. shutdown one node
mysql-js> cluster.status()
{
    "clusterName": "testCluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "localhost:3310",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
        "topology": {
            "localhost:3310": {
                "address": "localhost:3310",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "localhost:3320": {
                "address": "localhost:3320",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)"
            },
            "localhost:3330": {
                "address": "localhost:3330",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    }
}

3. After node 3 got disk space error:
"
017-12-11T00:13:33.517010Z 10 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 184
2017-12-11T00:13:33.524403Z 7 [ERROR] Disk is full writing './cluster1-relay-bin-group_replication_applier.000002' (Errcode: 15798816 - No space left on device). Waiting for someone to free space...
2017-12-11T00:13:33.524436Z 7 [ERROR] Retry in 60 secs. Message reprinted in 600 secs
2017-12-11T00:13:33.532060Z 10 [ERROR] Plugin group_replication reported: 'The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group.'
2017-12-11T00:23:33.744093Z 7 [ERROR] Disk is full writing './cluster1-relay-bin-group_replication_applier.000002' (Errcode: 15798816 - No space left on device). Waiting for someone to free space...
2017-12-11T00:23:33.744562Z 7 [ERROR] Retry in 60 secs. Message reprinted in 600 secs
"

4.innodb cluster still shows online and the leftover node can be write.
mysql-js> cluster.status()
{
    "clusterName": "testCluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "localhost:3310",
        "status": "OK_NO_TOLERANCE",
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
        "topology": {
            "localhost:3310": {
                "address": "localhost:3310",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "localhost:3320": {
                "address": "localhost:3320",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "(MISSING)"
            },
            "localhost:3330": {
                "address": "localhost:3330",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    }
}

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 365431c6-de05-11e7-930c-08002746781b | cluster1    |        3310 | ONLINE       |
| group_replication_applier | 53435a99-de05-11e7-9dc5-08002746781b | cluster1    |        3330 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0.15 sec)
 

Suggested fix:
node 3 should be expelled from the cluster and GR, since it won't be able to apply any transactions.
[3 Jan 2019 11:47] Erlend Dahl
Fixed in 8.0.2, not fixable in 5.7.