Bug #93578 | group_replication fatal error, mysqld dies | ||
---|---|---|---|
Submitted: | 12 Dec 2018 16:15 | Modified: | 5 Feb 2019 10:10 |
Reporter: | Eric Goldsmith | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Group Replication | Severity: | S3 (Non-critical) |
Version: | 8.0.13 | OS: | Windows (Windows Server 2008 R2 Std) |
Assigned to: | CPU Architecture: | x86 (64-bit AMD Opteron 6380) | |
Tags: | dies, exception, Fatal, replication, server |
[12 Dec 2018 16:15]
Eric Goldsmith
[13 Dec 2018 16:53]
Eric Goldsmith
Additionally, when an exception like this occurs, returning a non-zero error code can be used to cause the Windows service to be restarted (ref: MySQL service properties 'Recovery' tab). Currently, MySQL server 8.0.13 does not appear to do this, as I can't get Windows to restart the service after it fails.
[11 Jan 2019 16:52]
Eric Goldsmith
20 failures in the last 4 days have been observed, and each of the 3 servers in the cluster have exhibited this problem. Even though the error log states "Member was expelled from the group due to network failures", this does not appear to be so. Persistent connections to non-cluster MySQL services (on the same servers) have not died.
[16 Jan 2019 15:40]
Mario Staykov
I have observed the same bug, which I described in https://dba.stackexchange.com/questions/227199/group-replication-plugin-crashes-mysql-8-0/... before I was certain it's a bug. The context I encountered it in was slightly different - even just attempting to INSTALL PLUGIN caused the crash. The workaround that was found was specifying in /etc/mysql/my.cnf: loose-group_replication_exit_state_action = READ_ONLY Obviously, just triggering this behaviour that's default since 8.0.12 (https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replic...) shouldn't result in a MySQL crash and should be handled as an exception indicative of why MySQL will stop.
[18 Jan 2019 15:28]
Eric Goldsmith
Thanks Mario! I'll give that a try.
[5 Feb 2019 2:00]
MySQL Verification Team
Hi, Thanks for your report, bug is verified but I dropped the severity to S3 as there's a workaround. kind regards Bogdan
[5 Feb 2019 10:10]
Nuno Carvalho
Hi Eric, Lets split this in two parts. First, in 8.0.13 group_replication_exit_state_action default value is ABORT_SERVER. which means that when a error forces the server to abandon the group, like a network partition, it will abort. Abort here literally means abort, like your stack shows. On 8.0.14 we improved that behaviour by shutting down the server, please see Bug#91793. You can upgrade to 8.0.14 or change group_replication_exit_state_action to READ_ONLY. https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replic... Second, when a server is facing network partitions even if it is able to reconnect, it may be too late, that is, during the period it was disconnected too much data went though the communication layer which makes impossible the disconnected member to get all that traffic. On that cases, the server reconnects, realizes that it cannot be updated and leaves the group. The scenario you have on your logs. You can increase this period by adjusting https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replic... On future releases we will introduce a new approach to tackle this. Since I gave you two solutions to solve your situation, I'm closing this bug. If you have any doubt please reopen it and make your questions. Best regards, Nuno Carvalho