MySQL Bugs: #109615: Timeout on wait for view after joining group

Bug #109615	Timeout on wait for view after joining group
Submitted:	13 Jan 2023 7:14	Modified:	17 Jan 2023 18:22
Reporter:	zetang zeng	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Group Replication	Severity:	S3 (Non-critical)
Version:	5.7.40	OS:	Any
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
Deploy three nodes cluster and check status of cluster: all is fine. Then, restart all three nodes at the same time, all nodes report log like following and can not auto recover.

2023-01-11T03:41:27.393871Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to 10.0.0.253:34061 on local port: 34061.'
2023-01-11T03:41:27.393982Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to 10.0.0.254:34061 on local port: 34061.'
2023-01-11T03:41:27.393989Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error connecting to all peers. Member join failed. Local port: 34061'
2023-01-11T03:41:27.394130Z 0 [Warning] Plugin group_replication reported: 'read failed'
2023-01-11T03:41:27.399035Z 0 [ERROR] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 34061'
2023-01-11T03:41:43.808426Z 105 [Note] Got an error reading communication packets
2023-01-11T03:42:13.809204Z 286 [Note] Got an error reading communication packets
2023-01-11T03:42:27.371795Z 2 [ERROR] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2023-01-11T03:42:27.371820Z 2 [Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member'
2023-01-11T03:42:27.371830Z 2 [ERROR] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'

How to repeat:
- Deploy three nodes cluster on ip1, ip2, ip3
- check status:

{
    "clusterName": "myCluster",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "192.168.3.158:3406",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "192.168.3.156:3406": {
                "address": "192.168.3.156:3406",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE",
                "version": "5.7.39"
            },
            "192.168.3.157:3406": {
                "address": "192.168.3.157:3406",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE",
                "version": "5.7.39"
            },
            "192.168.3.158:3406": {
                "address": "192.168.3.158:3406",
                "memberRole": "PRIMARY",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE",
                "version": "5.7.39"
            }
        },
        "topologyMode": "Single-Primary"
    },
    "groupInformationSourceMember": "192.168.3.158:3406"
}

- Open three terminal and connect to ip1, ip2, ip3 seperately
- Copy "systemctl restart mysqld" to three terminal, and restart all mysql at the same time
- All nodes restarted but cluster can't auto recover.

Suggested fix:
Cluster can auto recover

Hi,

Can you please let us know whether your have setup all your servers to be fully ACID-compliant ???

Please, read our Manual on how to configure your OS and  InnoDB in order to be 100  % ACID compliant.

Also, repeat your experiment after full ACID setup is made, including stringest InnoDB log flushing, OS full flushing,  fsync / sync , complete disabling of the OS, filesystem and disk caches , etc , etc .......

You are welcome to contact us again, after you have configured ACID compatibility, as described above.

Any link to 'Manual on how to configure your OS and  InnoDB in order to be 100  % ACID compliant.'? 

And doesn't this question only has to do with Group Replication? why os/ innodb config check?

Hi,

This has nothing to do with ACID, apologies for misinformation, anyhow, this behavior is not a bug, when you do a reboot of all servers you do a full cluster crash and there is no automatic recovery from that. You need to manually recover the system from such scenario.

Thank you for using MySQL

Additional info:
 - there must be on server capable to bootstrap the group. Look at documentation about "configuring instances" it is explained in details (https://dev.mysql.com/doc/refman/5.7/en/group-replication-configuring-instances.html )
 - since you are using mysql shell, you can use: rebootClusterFromCompleteOutage() ( https://dev.mysql.com/doc/mysql-shell/8.0/en/reboot-outage.html ) 

Thank you for using MySQL