Bug #49201 ndbmtd dies while starting phase 1 using LockPagesInMainMemory
Submitted: 30 Nov 2009 10:30 Modified: 30 Nov 2009 16:05
Reporter: Robert Klikics Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:telco-7.0.9b OS:Linux (Debian 5.0)
Assigned to: CPU Architecture:Any
Tags: LockPagesInMainMemory, mlock, mlockall, ndbmtd

[30 Nov 2009 10:30] Robert Klikics
Description:
We're using the multithreaded ndbd. While a rolling restart (we've updated the configuration LockPagesInMainMemory=0 --> LockPagesInMainMemory=1) one of our ndb nodes dies while starting phase 1 with the following error message:

2009-11-30 10:52:49 [MgmtSrvr] ALERT    -- Node 4: Forced node shutdown completed. Occured during startphase 1. Caused by error 6050: 'WatchDog terminate, internal error or massive overload on the machine running this node(Internal error, programming error or missing error message, please report a

A ndb_error_reporter report which was taken after the crash, is attached under following url:

http://85.25.144.101/files/ndb_error_report_20091130111343.tar.bz2

How to repeat:
Unfortunately this error does not occured on a other node, but it seem's to crash when you enable LockPagesInMainMemory while you're using a large amount of DataMemory and IndexMemory (in our config its about 50 GB) and a too short TimeBetweenWatchDogCheck.

Suggested fix:
Disable the watchdog if LockPagesInMainMemory is enabled and while allocating memory?!
[30 Nov 2009 12:20] Andrew Hutchings
Please look at the following setting:

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-ti...

Adjusting this should stop this timeout from occurring.
[30 Nov 2009 16:05] Robert Klikics
Ok thanks for this advice, but it is possible to add a correct error message?!