Bug #11218 Error message Internal program error (failed ndbrequire) Fault ID: 2341
Submitted: 9 Jun 2005 19:14 Modified: 1 Sep 2005 14:38
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:4.1.12,5.0, 5.1.0 OS:Linux (Linux)
Assigned to: Tomas Ulin CPU Architecture:Any

[9 Jun 2005 19:14] Jonathan Miller
Description:
The following error message needs to be cleaned up:

In setting up today, I ran into an issue that I resolved, but the error message bothers me, because it really did not tell me what the issue was.

I had used an older config.ini file that had FileSystemPath set to /space/autotest/run

The ndbd would bomb on one system with:

Date/Time: Thursday 9 June 2005 - 19:45:47 Type of error: error
Message: Internal program error (failed ndbrequire) Fault ID: 2341 Problem data: NdbcntrMain.cpp Object of reference: NDBCNTR (Line: 2411) 0x0000000e
ProgramName: ../libexec/ndbd
ProcessID: 18903
TraceFile: ./ndb_2_trace.log.2
Version 5.1.0 (a_drop5p2)
***EOM***

I thought that maybe it was due to an invalid path, but the path is there, and it has the right permissions. So I thought I would try an invalid path:

Date/Time: Thursday 9 June 2005 - 19:42:20 Type of error: error
Message: Illegal file system path
Fault ID: 2805
Problem data: /home/ndbdev/jmiller/builds/run/nodir
Object of reference:  Filename::init()
ProgramName: ../libexec/ndbd
ProcessID: 18848
TraceFile: ./ndb_2_trace.log.1
Version 5.1.0 (a_drop5p2)
***EOM*** 

This is what I would expect; I nice clear error message. I corrected the issue I was having by setting the path  /home/ndbdev/jmiller/builds/run

How to repeat:
Only way I know to repeat is to use NDB08 and set SystemFilePath to /space/autotest/run and try to start NDBD

Suggested fix:
Correct the following message to something that makes sense:

Date/Time: Thursday 9 June 2005 - 19:45:47 Type of error: error
Message: Internal program error (failed ndbrequire) Fault ID: 2341 Problem data: NdbcntrMain.cpp Object of reference: NDBCNTR (Line: 2411) 0x0000000e
ProgramName: ../libexec/ndbd
ProcessID: 18903
TraceFile: ./ndb_2_trace.log.2
Version 5.1.0 (a_drop5p2)
***EOM***
[24 Jul 2005 21:50] Jonathan Miller
Has also been reported by a customer:

What is the correct procedure for recovering a failed NDBD node?

It seems that in my experience, every time a server running an NDBD node is shutdown uncleanly (due to crash or power failure or ...) NDBD refuses to start on reboot and I have to restore the database from backup. Not good!

This morning I found one of our servers (just a development box,
fortunately) had hanged itself overnight. On reboot, NDBD would not start (could not alloc node id) until I did a PURGE STALE SESSIONS with ndb_mgm.

Now it starts and gets to phase 5 for about 20 seconds and then exits. 
The error_log shows only the following cryptic message:

Date/Time: x 24 July 2005 - 11:14:54
Type of error: error
Message: Internal program error (failed ndbrequire) Fault ID: 2341 Problem data: Dbdict.cpp Object of reference: DBDICT (Line: 11636) 0x0000000a
ProgramName: /usr/sbin/ndbd
ProcessID: 5042
TraceFile: /var/lib/mysql-cluster/ndb_3_trace.log.3
Version 4.1.12
***EOM***

Trace log: http://www.expio.co.nz/~sgarner/misc/ndb_3_trace.log.3.gz

The node is part of a 2-server, 2-node cluster with 2 replicas (+ a 3rd machine as mgm). The other node, and the cluster, is still operational. 
Why can't the failed node repair itself from the working node?

Should I be using --initial? I think I've tried that before in similar circumstances and just ended up losing the whole cluster. So before I do that, I'd like to know if there's anything else I can try.

Unless I missed something, the manual is a little sparse on the topic of recovering a failed NDB, so I'd appreciate any help.

thanks
-Simon
[31 Aug 2005 17:11] Tomas Ulin
error messages on filesystem issues have been cleaned up from 4.1.15, 5.0.12
[1 Sep 2005 14:38] Paul DuBois
Noted in 4.1.15, 5.0.12 changelogs.
[12 Oct 2005 15:48] Jeff Schachter
This happens to me with 4.1.14 - I get:

Date/Time: Wednesday 12 October 2005 - 11:46:14
Type of error: error
Message: Internal program error (failed ndbrequire)
Fault ID: 2341
Problem data: Dbdict.cpp
Object of reference: DBDICT (Line: 11762) 0x00000002
ProgramName: /apps/mysql41/bin/ndbd
ProcessID: 31682
TraceFile: /apps/mysql41/cluster/ndb_2_trace.log.5
Version 4.1.14
***EOM***

Is there somewhere I can find an explanation of this problem?