MySQL Bugs: #69822: ndbd crashes periodically

Bug #69822	ndbd crashes periodically
Submitted:	23 Jul 2013 11:30	Modified:	19 Sep 2013 11:03
Reporter:	Dirar Abu-Saymeh	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Disk Data	Severity:	S1 (Critical)
Version:	7.2.13	OS:	Linux
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:

Time: Tuesday 23 July 2013 - 00:41:18
Status: Temporary error, restart node
Message: LCP fragment scan watchdog detected a problem.  Please report a bug. (Internal error, programming error or missing error message, please report a bug)
Error: 7200
Error data: Please report this as a bug. Provide as much info as possible, expecially all the ndb_*_out.log files, Thanks. Shutting down node due to lack of LCP fragment scan progress
Error object: DBLQH (Line: 23878) 0x00000006
Program: ndbd
Pid: 20

How to repeat:
occurs periodically in my production environment. not sure how you can repeat it.

IT is now crashing every day. Becoming a more serious issue. Any support?

Hi Dirar,

Please use the ndb_error_reporter utility when reporting cluster bugs. The cluster log from the management node(s), logs from other data nodes, and the config.ini file can provide useful information too. They are all included by the ndb_error_reporter utility.

I see you have reported this bug using category ClusterDD. Are you using disk data tables or did you confuse it with the logs being written to disk?

Any changes done to your cluster lately or how you use it? Is it always same node failing? Do you happen to be running out of disk space?

/Gustaf

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

From looking at the logs I suspect that the LCP watchdog fired after no progress for 70 seconds. It seems that no disk writes have happened for a long time. So most likely cause of problem is that you run in an environment where disk access isn't so good.