Bug #49263 ndbd crashes with wrong error message when Undo Files path is invalid
Submitted: 1 Dec 2009 14:23 Modified: 9 Dec 2009 15:04
Reporter: Geert Vanderkelen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Disk Data Severity:S3 (Non-critical)
Version:mysql-5.1-telco-6.3 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: crash, disk data, ndbd, undo

[1 Dec 2009 14:23] Geert Vanderkelen
Description:
When the FileSystemPathUndoFiles is set to an non-existing path, the data nodes will exit with an bogus error message (and errno). The actual error can be read out of the traces.

Time: Tuesday 1 December 2009 - 15:12:17
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbdict/Dbdict.cpp
Error object: DBDICT (Line: 3527) 0x0000000a
Program: /data1/mysql/ndb-6.3bzr/libexec/ndbd
Pid: 9657
Trace: /data2/users/geert/cluster/master/ndb_3_trace.log.1
Version: mysql-5.1.39 ndb-6.3.29-GA

The trace contains this:

--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 3, r.sigId: 217147 gsn: 717 "CREATE_FILE_REF" prio: 1
s.bn: 260 "LGMAN", s.proc: 3, s.sigId: 217145 length: 5 trace: 0 #sec: 0 fragInf: 0
 H'00000003 H'01040003 H'000005e5 H'00000aff H'00000002
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 217146 gsn: 164 "CONTINUEB" prio: 1
s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 217144 length: 1 trace: 0 #sec: 0 fragInf: 0
 Scanning the memory channel again with no delay
--------------- Signal ----------------
r.bn: 260 "LGMAN", r.proc: 3, r.sigId: 217145 gsn: 260 "FSOPENREF" prio: 1
s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 217144 length: 4 trace: 0 #sec: 0 fragInf: 0
 UserPointer: 565248
 ErrorCode: 2815, File not found
 OS ErrorCode: 2 
--------------- Signal ----------------

How to repeat:
Basic cluster configuration with two data nodes (1 should be enough):

[NDBD DEFAULT]
Datadir=/var/lib/cluster
NoOfReplicas=2
DataMemory=260M
IndexMemory=30M
FileSystemPathUndoFiles=/var/lib/cluster/UNDO

Create log file and tablespace:

CREATE LOGFILE GROUP lg_1
    ADD UNDOFILE 'undo_1.log'
    INITIAL_SIZE 16M
    UNDO_BUFFER_SIZE 2M
    ENGINE NDBCLUSTER;

CREATE TABLESPACE ts_1
    ADD DATAFILE 'data_1.dat'
    USE LOGFILE GROUP lg_1
    INITIAL_SIZE 32M
    ENGINE NDBCLUSTER;

Shutdown cluster, alter config.ini changing this:
FileSystemPathUndoFiles=/var/lib/cluster/UNDO_FOO

Start ndb_mgmd, start ndbd and see it exit with an "Internal program error"

Suggested fix:
Exiting with an Error is fine, but it would be nicer at least the following error showing:
 ErrorCode: 2815, File not found

Or even saying that the Undo path is incorrect?
[1 Dec 2009 14:49] Geert Vanderkelen
Verified using 6.3bzr and 7.0bzr (pull from 20091201).
[8 Dec 2009 15:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/93223

3188 Jonas Oreland	2009-12-08
      ndb - bug#49263 - reasonable error message when failing to recreate DD object during node/system restart
[8 Dec 2009 15:46] Jonas Oreland
Pushed to 6.3.29 and 7.0.10
[9 Dec 2009 15:02] Jon Stephens
Documented bugfix in the NDB-6.3.29 and 7.0.10 changelogs as follows:

        When the FileSystemPathUndoFiles configuration parameter was set
        to an non-existent path, the data nodes shut down with the
        generic error 2341 (Internal program error). Now in such cases,
        the error reported is error 2815 (File not found).

Closed.