Bug #109615 Timeout on wait for view after joining group
Submitted: 13 Jan 2023 7:14 Modified: 17 Jan 2023 18:22
Reporter: zetang zeng Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:5.7.40 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[13 Jan 2023 7:14] zetang zeng
Description:
Deploy three nodes cluster and check status of cluster: all is fine. Then, restart all three nodes at the same time, all nodes report log like following and can not auto recover.

2023-01-11T03:41:27.393871Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to 10.0.0.253:34061 on local port: 34061.'
2023-01-11T03:41:27.393982Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to 10.0.0.254:34061 on local port: 34061.'
2023-01-11T03:41:27.393989Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error connecting to all peers. Member join failed. Local port: 34061'
2023-01-11T03:41:27.394130Z 0 [Warning] Plugin group_replication reported: 'read failed'
2023-01-11T03:41:27.399035Z 0 [ERROR] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 34061'
2023-01-11T03:41:43.808426Z 105 [Note] Got an error reading communication packets
2023-01-11T03:42:13.809204Z 286 [Note] Got an error reading communication packets
2023-01-11T03:42:27.371795Z 2 [ERROR] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2023-01-11T03:42:27.371820Z 2 [Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member'
2023-01-11T03:42:27.371830Z 2 [ERROR] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'

How to repeat:
- Deploy three nodes cluster on ip1, ip2, ip3
- check status:

{
    "clusterName": "myCluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "192.168.3.158:3406",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "192.168.3.156:3406": {
                "address": "192.168.3.156:3406",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE",
                "version": "5.7.39"
            },
            "192.168.3.157:3406": {
                "address": "192.168.3.157:3406",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE",
                "version": "5.7.39"
            },
            "192.168.3.158:3406": {
                "address": "192.168.3.158:3406",
                "memberRole": "PRIMARY",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE",
                "version": "5.7.39"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "192.168.3.158:3406"
}

- Open three terminal and connect to ip1, ip2, ip3 seperately
- Copy "systemctl restart mysqld" to three terminal, and restart all mysql at the same time
- All nodes restarted but cluster can't auto recover.

Suggested fix:
Cluster can auto recover
[13 Jan 2023 13:44] MySQL Verification Team
Hi,

Can you please let us know whether your have setup all your servers to be fully ACID-compliant ???

Please, read our Manual on how to configure your OS and  InnoDB in order to be 100  % ACID compliant.

Also, repeat your experiment after full ACID setup is made, including stringest InnoDB log flushing, OS full flushing,  fsync / sync , complete disabling of the OS, filesystem and disk caches , etc , etc .......

You are welcome to contact us again, after you have configured ACID compatibility, as described above.
[14 Jan 2023 4:36] zetang zeng
Any link to 'Manual on how to configure your OS and  InnoDB in order to be 100  % ACID compliant.'? 

And doesn't this question only has to do with Group Replication? why os/ innodb config check?
[17 Jan 2023 18:22] MySQL Verification Team
Hi,

This has nothing to do with ACID, apologies for misinformation, anyhow, this behavior is not a bug, when you do a reboot of all servers you do a full cluster crash and there is no automatic recovery from that. You need to manually recover the system from such scenario.

Thank you for using MySQL
[17 Jan 2023 19:05] MySQL Verification Team
Additional info:
 - there must be on server capable to bootstrap the group. Look at documentation about "configuring instances" it is explained in details (https://dev.mysql.com/doc/refman/5.7/en/group-replication-configuring-instances.html )
 - since you are using mysql shell, you can use: rebootClusterFromCompleteOutage() ( https://dev.mysql.com/doc/mysql-shell/8.0/en/reboot-outage.html ) 

Thank you for using MySQL