Bug #86797 | InnoDB Cluster does not report when nodes fall behind or stop writing data. | ||
---|---|---|---|
Submitted: | 22 Jun 2017 16:16 | Modified: | 27 Jul 2017 17:06 |
Reporter: | Sivan Koren | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | Shell AdminAPI InnoDB Cluster / ReplicaSet | Severity: | S1 (Critical) |
Version: | 5.7.18 | OS: | Linux |
Assigned to: | CPU Architecture: | Any | |
Tags: | InnoDB Cluster |
[22 Jun 2017 16:16]
Sivan Koren
[28 Jun 2017 18:31]
MySQL Verification Team
Hi, Thanks for submitting a bug report, I verified it as described. At the moment InnoDB cluster is still lacking some reporting capabilities, we are working on that. all best Bogdan
[26 Jul 2017 11:32]
Paulo Jesus
Posted by developer: The status information reported by InnoDB Cluster/AdminAPI is based on the information provided by Group Replication (GR) through some specific tables from the performance_schema (see: https://dev.mysql.com/doc/refman/5.7/en/group-replication-monitoring.html). Can you please provide the data/output (SELECT * FROM ...) for the following tables on all members of the cluster, in order to verify if an incorrect status is being reported by GR or the AdminAPI: - performance_schema.replication_group_members (https://dev.mysql.com/doc/refman/5.7/en/group-replication-replication-group-members.html) - performance_schema.replication_group_member_stats (https://dev.mysql.com/doc/refman/5.7/en/group-replication-replication-group-member-stats.h...) Please also provide any error logs on the servers. Can you please also verify if there is a network partition in the cluster (see: https://dev.mysql.com/doc/refman/5.7/en/group-replication-detecting-partitions.html), which could explain what is happening. Thank you
[27 Jul 2017 17:06]
Sivan Koren
Unfortunately I have dismantled this test environment so that I can pilot other platforms. Therefore, I can no longer provide those details without rebuilding it. Hopefully the instructions I've provided for recreating the issue will be useful to you. When the issue was occurring I was actively observing the logs, (tail -f) and saw nothing written. I don't mean just nothing of interest, but rather nothing at all written to the logs. The last entry read that the node was declared online even as the node had stopped writing data as evidenced by comparing "select count(*)" on each node. The cluster was made up of 3 physical machines with redundant network connections to a private switch with no external or network storage on clean installations of CentOS 7. They are Dell servers each with a 4 disk megaraid storage controller and all disks are in good health.