Bug #86797 InnoDB Cluster does not report when nodes fall behind or stop writing data.
Submitted: 22 Jun 2017 16:16 Modified: 27 Jul 2017 17:06
Reporter: Sivan Koren Email Updates:
Status: Verified Impact on me:
None 
Category:Shell AdminAPI InnoDB Cluster / ReplicaSet Severity:S1 (Critical)
Version:5.7.18 OS:Linux
Assigned to: CPU Architecture:Any
Tags: InnoDB Cluster

[22 Jun 2017 16:16] Sivan Koren
Description:
InnoDB Cluster and MySQL Server do not report when a node falls behind and the node will not catch up on it's own. Instead it will stop writing new data while continuing to declare that it's online, hence dishing out stale data. Furthermore it does not log any errors when this happens.

How to repeat:
One way to reproduce this is to create a cluster and then use sysbench to load test it.

Here is an example:

Create a database called "sbtest" and create a user sbtest with all privileges on to that database.

sysbench --test=oltp --num-threads=16 --max-requests=10000 --db-driver=mysql --mysql-host=sqlrouter.example.com --mysql-port=6446 --mysql-user=sbtest --mysql-password='ExAmpl3!123' --oltp-table-size=600000 --mysql-engine-trx=yes --oltp-reconnect-mode=transaction --mysql-table-engine=innodb --max-requests=600000 prepare

then

sysbench --test=oltp --num-threads=16 --max-requests=10000 --db-driver=mysql --mysql-host=sql.example.com --mysql-port=6446 --mysql-user=sbtest --mysql-password='ExAmpl3!123' --oltp-table-size=600000 --mysql-engine-trx=yes --oltp-reconnect-mode=transaction --mysql-table-engine=innodb --max-requests=600000 run

After use SQL client or the MySQL command line client to count the rows in the sbtest table on each node in the cluster, "select count(*) from sbtest.sbtest"

If the rows are consistent repeat with values of 1,000,000 instead of 600,000. The idea is to push hard enough to cause at least one RO node to fall behind.

Once one of the nodes is behind, use MySQL Shell to check the cluster.status(). It will continue to report that the node is "ONLINE". The MySQL Router will continue to read from that node, even though the data is now stale.

Furthermore that node will no longer write new data, instead it will continue to fall further behind.

Finally, restart mysqld on the node that has fallen behind. Then cluster.rejoinInstance('XXXX') to add it back to the cluster. After a few moments the node will catch up and write all of the missing data and begin functioning normally.

Check the logs, and note that this entire failure is not recorded.

Suggested fix:
If the node can catch up by simply restarting, then InnoDB Cluster should handle this on it's own.

MORE IMPORTANTLY, InnoDB Cluster should NOT report that a node that has fallen behind is ONLINE because the MySQL Router needs to know that this node is not a viable node for retrieving data. This is critical for normal business operation.

Therefore, at the very least, InnoDB Cluster needs to report this failure in the MySQL Shell/InnoDB Cluster AND it definitely needs to log something.
[28 Jun 2017 18:31] MySQL Verification Team
Hi,

Thanks for submitting a bug report, I verified it as described.

At the moment InnoDB cluster is still lacking some reporting capabilities, we are working on that. 

all best
Bogdan
[26 Jul 2017 11:32] Paulo Jesus
Posted by developer:
 
The status information reported by InnoDB Cluster/AdminAPI is based on the information provided by Group Replication (GR) through some specific tables from the performance_schema (see: https://dev.mysql.com/doc/refman/5.7/en/group-replication-monitoring.html).

Can you please provide the data/output (SELECT * FROM ...) for the following tables on all members of the cluster, in order to verify if an incorrect status is being reported by GR or the AdminAPI:
- performance_schema.replication_group_members (https://dev.mysql.com/doc/refman/5.7/en/group-replication-replication-group-members.html)
- performance_schema.replication_group_member_stats (https://dev.mysql.com/doc/refman/5.7/en/group-replication-replication-group-member-stats.h...)

Please also provide any error logs on the servers.

Can you please also verify if there is a network partition in the cluster (see: https://dev.mysql.com/doc/refman/5.7/en/group-replication-detecting-partitions.html), which could explain what is happening.

Thank you
[27 Jul 2017 17:06] Sivan Koren
Unfortunately I have dismantled this test environment so that I can pilot other platforms. Therefore, I can no longer provide those details without rebuilding it. Hopefully the instructions I've provided for recreating the issue will be useful to you.

When the issue was occurring I was actively observing the logs, (tail -f) and saw nothing written. I don't mean just nothing of interest, but rather nothing at all written to the logs. The last entry read that the node was declared online even as the node had stopped writing data as evidenced by comparing "select count(*)" on each node.

The cluster was made up of 3 physical machines with redundant network connections to a private switch with no external or network storage on clean installations of CentOS 7. They are Dell servers each with a 4 disk megaraid storage controller and all disks are in good health.