MySQL Bugs: #75823: Invalid LCP (Ndbd file system inconsistency error, please report a bug)

Bug #75823	Invalid LCP (Ndbd file system inconsistency error, please report a bug)
Submitted:	9 Feb 2015 9:14	Modified:	10 Jun 2015 11:03
Reporter:	Сергей Кукуев	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	7.3.6	OS:	Linux (2.6.39-200.24.1.el6uek.x86_64)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	2352, Error 2352, Invalid LCP, LCP, Ndbd file system error

Description:
Two data nodes simultaneously got following error which caused loss of data:

Time: Thursday 5 February 2015 - 19:41:56
Status: Ndbd file system error, restart node initial
Message: Invalid LCP (Ndbd file system inconsistency error, please report a bug)
Error: 2352
Error data: T50F15
Error object: RESTORE (Line: 1286) 0x00000002
Program: ndbmtd
Pid: 26712 thr: 5
Version: mysql-5.6.19 ndb-7.3.6
Trace: /opt/mysql/ndb_5_trace.log.4 [t1..t10]
***EOM***

How to repeat:
I don't know

Error report from 7.3.6

Attachment: ndb_error_report_20150209131943.tar.bz2 (application/octet-stream, text), 2.67 MiB.

Hi,

Can you give me some more insight about your problem as it is almost impossible to reproduce.
 - did you make any changes to your system recently
 - are you 100% sure your hardware is ok (check raid controller, disk status and also ethernet adapters if they show any errors, syslog too)
 - are you using NFS by any chance or any other type of network/shared storage?
 - are you using disk data with 7.3
 - was the amount of traffic on your system in any way different then usual (for e.g. you had a load spike?)

kind regards
Bogdan Kecman

Hi!

- did you make any changes to your system recently
No

 - are you 100% sure your hardware is ok (check raid controller, disk status and also ethernet adapters if they show any errors, syslog too)
Since it happened in Feb I cannot get logs. It is virtual machines.

 - are you using NFS by any chance or any other type of network/shared storage?
No

 - are you using disk data with 7.3
Don't quite understand question. We are using only ENGINE=NDBCLUSTER.

 - was the amount of traffic on your system in any way different then usual (for e.g. you had a load spike?)
Don't think so, but system was under load testing.

Hi,

> Since it happened in Feb I cannot get logs. 

There is no way I can reproduce this as without logs I can't say I did or did not; but in any way I don't believe this is a bug and without logs I can't say what exactly happened.

> It is virtual machines.

What I can assume is that this is your problem. MySQL Cluster expect to have constant io throughput even if you don't have any load on the custer itself. It will record LCP non stop. If your IO suddenly becomes unavailable a crash due to inability to execute lcp/gcp will happen (as it did for you). If you are using VM the IO is not stable as other VM on the same box can use the IO and your data node will die.

>  - are you using disk data with 7.3
> Don't quite understand question. We are using only ENGINE=NDBCLUSTER.

NDBCLUSTER storage engine support tables that are all in memory and tables where some of the columns are stored on disk - it is called "disk data".
https://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-disk-data-objects.html

>>  - was the amount of traffic on your system in any way
>> different then usual (for e.g. you had a load spike?)
> Don't think so, but system was under load testing.

If the system is under load testing it for sure has higher traffic then normal, what other "load" could you be testing. If the load on the cluster is higher then the IO of the data nodes can handle you will get LCP/GCP crash. That's not a bug. In order to properly configure both hw and the cluster itself according to your needs you should contact MySQL support team.

kind regards
Bogdan Kecman

We don't use disk data in our tables.

How we can monitror situations when cluster cannot write next LCP/GCP, how often it happens?

Hi,

You have some more details here:
https://dev.mysql.com/doc/refman/5.6/en/mysql-cluster-ndbd-definition.html#mysql-cluster-l...
and here: https://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-config-lcp-params.html

also nice graph by Oli here: http://www.fromdual.com/mysql-cluster-lcp-gcp

How to monitor? Well you monitor your server using whatever monitoring tool you prefer (zabbix, cacti..), have a sysstat running so you can see sar logs when you need to.. anyhow standard monitoring of the server health. The LCP and GCP take IO. The amount of IO they take is defined by disk*speed* parameters; how often lcp is written is dependent on the timebetweenlocalchecpoints and the gcp depends on timebetweenglobacheckpoints, size of your transactions and amount of data changed in transactions. 

You can also use DUMP 2303 to get more detailed info about LCP and you can increase logging level to get more info about lcp/gcp non stop in the logs (logs will get big fast)

kind regards
Bogdan Kecman