Bug #100036 | Unable to fetch live group_replication member data from any server in replicaset | ||
---|---|---|---|
Submitted: | 29 Jun 2020 14:28 | Modified: | 9 Aug 2020 11:20 |
Reporter: | Snehal Bhavsar | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Router | Severity: | S1 (Critical) |
Version: | OS: | CentOS | |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
[29 Jun 2020 14:28]
Snehal Bhavsar
[7 Jul 2020 11:29]
MySQL Verification Team
Hi, Can you please share the configuration from the servers. Also, how did this group ended up with only 1 server, did you "shutdown" (properly) two of them or ? Thanks Bogdan
[8 Jul 2020 8:54]
Snehal Bhavsar
No, we do not shutdown these nodes. These two servers gets missing from the cluster every time due to these error which is again a bug of writeset 2020-06-27T09:32:38.523253Z 18 [ERROR] [MY-010584] [Repl] Slave SQL for channel 'group_replication_applier': Worker 4 failed executing transaction 'e295b724-53c8-11ea-80c8-fa163efa4b49:381194653'; Could not execute Delete_rows event on table xxxxxxxx.QRTZ_TRIGGERS; Cannot delete or update a parent row: a foreign key constraint fails (`xxxxxxxx`.`QRTZ_CRON_TRIGGERS`, CONSTRAINT `QRTZ_CRON_TRIGGERS_ibfk_1` FOREIGN KEY (`SCHED_NAME`, `TRIGGER_NAME`, `TRIGGER_GROUP`) REFERENCES `QRTZ_TRIGGERS` (`SCHED_NAME`, `TRIGGER_NAME`, `), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED, Error_code: MY-001451 2020-06-27T09:32:38.523843Z 14 [ERROR] [MY-011451] [Repl] Plugin group_replication reported: 'The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group.' 2020-06-27T09:32:38.527530Z 11 [ERROR] [MY-011452] [Repl] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.' 2020-06-27T09:32:38.530952Z 11 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.' 2020-06-27T09:32:38.542099Z 14 [ERROR] [MY-010586] [Repl] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 0
[8 Jul 2020 16:12]
MySQL Verification Team
Hi, And the configs? Can you share them? Thanks Bogdan
[9 Jul 2020 11:20]
MySQL Verification Team
Hi, When 2 out of the 3 nodes went awoll you lost majority (quorum) so it is by design that write is not allowed. Are you following the defined procedures to restore (unblock the group first), to me it looks you are not. https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-cluster-working-with-cluster.html#res... kind regards Bogdan
[15 Jul 2020 11:35]
Andrzej Religa
Hi All, I was looking into that from the MySQL Router perspective. I could not reproduce it by simply creating 3-nodes cluster and doing "STOP GROUP_REPLICATION" on the 2 RO nodes. I can still use bot RW and RO ports after that. So there's gotta be more to that. One potential reason for the error message like that in the log could be the instance UUID in the cluster metadata and the one reported by the instance itself became different for some reason. I would need the output from the following queries to confirm that: select @@server_uuid; select * from instances \G BR, Andrzej
[15 Jul 2020 11:41]
Andrzej Religa
Forgot to mention that for select * from instances \G one needs to: use mysql_innodb_cluster_metadata; -- BR, Andrzej
[10 Aug 2020 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".