MySQL Bugs: #49201: ndbmtd dies while starting phase 1 using LockPagesInMainMemory

Bug #49201	ndbmtd dies while starting phase 1 using LockPagesInMainMemory
Submitted:	30 Nov 2009 10:30	Modified:	30 Nov 2009 16:05
Reporter:	Robert Klikics	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	telco-7.0.9b	OS:	Linux (Debian 5.0)
Assigned to:		CPU Architecture:	Any
Tags:	LockPagesInMainMemory, mlock, mlockall, ndbmtd

Description:
We're using the multithreaded ndbd. While a rolling restart (we've updated the configuration LockPagesInMainMemory=0 --> LockPagesInMainMemory=1) one of our ndb nodes dies while starting phase 1 with the following error message:

2009-11-30 10:52:49 [MgmtSrvr] ALERT    -- Node 4: Forced node shutdown completed. Occured during startphase 1. Caused by error 6050: 'WatchDog terminate, internal error or massive overload on the machine running this node(Internal error, programming error or missing error message, please report a

A ndb_error_reporter report which was taken after the crash, is attached under following url:

http://85.25.144.101/files/ndb_error_report_20091130111343.tar.bz2

How to repeat:
Unfortunately this error does not occured on a other node, but it seem's to crash when you enable LockPagesInMainMemory while you're using a large amount of DataMemory and IndexMemory (in our config its about 50 GB) and a too short TimeBetweenWatchDogCheck.

Suggested fix:
Disable the watchdog if LockPagesInMainMemory is enabled and while allocating memory?!

Please look at the following setting:

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-ti...

Adjusting this should stop this timeout from occurring.

Ok thanks for this advice, but it is possible to add a correct error message?!