Bug #69822 ndbd crashes periodically
Submitted: 23 Jul 2013 11:30 Modified: 19 Sep 2013 11:03
Reporter: Dirar Abu-Saymeh Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Disk Data Severity:S1 (Critical)
Version:7.2.13 OS:Linux
Assigned to: Assigned Account CPU Architecture:Any

[23 Jul 2013 11:30] Dirar Abu-Saymeh
Description:

Time: Tuesday 23 July 2013 - 00:41:18
Status: Temporary error, restart node
Message: LCP fragment scan watchdog detected a problem.  Please report a bug. (Internal error, programming error or missing error message, please report a bug)
Error: 7200
Error data: Please report this as a bug. Provide as much info as possible, expecially all the ndb_*_out.log files, Thanks. Shutting down node due to lack of LCP fragment scan progress
Error object: DBLQH (Line: 23878) 0x00000006
Program: ndbd
Pid: 20

How to repeat:
occurs periodically in my production environment. not sure how you can repeat it.
[13 Aug 2013 13:57] Dirar Abu-Saymeh
IT is now crashing every day. Becoming a more serious issue. Any support?
[19 Aug 2013 11:03] Gustaf Thorslund
Hi Dirar,

Please use the ndb_error_reporter utility when reporting cluster bugs. The cluster log from the management node(s), logs from other data nodes, and the config.ini file can provide useful information too. They are all included by the ndb_error_reporter utility.

I see you have reported this bug using category ClusterDD. Are you using disk data tables or did you confuse it with the logs being written to disk?

Any changes done to your cluster lately or how you use it? Is it always same node failing? Do you happen to be running out of disk space?

/Gustaf
[20 Sep 2013 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[7 Jul 2015 11:48] Mikael Ronström
From looking at the logs I suspect that the LCP watchdog fired after no progress for 70 seconds. It seems that no disk writes have happened for a long time. So most likely cause of problem is that you run in an environment where disk access isn't so good.