Bug #33900 One NDB node is down and can not restart
Submitted: 17 Jan 2008 16:44 Modified: 25 May 2012 9:00
Reporter: Yann Le Rouzic Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.0.41 OS:Linux (RHEL 4 Update 5)
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: Cluster NDB 2341

[17 Jan 2008 16:44] Yann Le Rouzic
Description:
One NDB node in my cluster has shutdown and can not be restarted. It keeps generating this error:

Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug
)
Error: 2341
Error data: Dbdict.cpp
Error object: DBDICT (Line: 2612) 0x0000000a
Program: /usr/local/mysql/5.0.41/bin/ndbd
Pid: 17817
Trace: /data/mysql/5.0.41/data/ndb_4_trace.log.14
Version: Version 5.0.41
***EOM***

The other node is up, no data problem. Trace and log files can be provided if needed.

How to repeat:
I just have to restart the ndbd process to generate the error again.

Suggested fix:
ndbd --initial ?
[17 Jan 2008 16:54] Jonas Oreland
it looks like a corrupted table file,
most likely "ndbd --initial" will do the trick,
and I would save the filesystem *first*,

also, exactly what is wrong is impossible to tell wo/ the tracefile

/jonas
[23 Jan 2008 9:01] Yann Le Rouzic
Any idea about the cause of this issue, using the log files?
[23 Jan 2008 10:14] Jonas Oreland
Hi,

yes, the starting node fails due as it can not 
  allocate an Attribute (SQL Column) (config MaxNoOfAttributes)

So my guess would be that you have done a config change of this variable
  "recently", where you updated ndb_mgmd, but have not tried to restart
  cluster.

I recommend increasing this value.
If you have *not* modified this value "recently", then it's a bug somewhere.

Let me know if this helps

/jonas
[23 Jan 2008 13:54] Yann Le Rouzic
Thanks for your answer Jonas. Indeed, we have discovered that the config.ini file was not the same on both servers. We corrected it, but we encounter another error when restarting ndbd:

Time: Wednesday 23 January 2008 - 14:48:23
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 3 as copyfrag failed, error: 827
Error object: NDBCNTR (Line: 196) 0x0000000a
Program: /usr/local/mysql/5.0.41/bin/ndbd
Pid: 9787
Trace: /data/mysql/5.0.41/data/ndb_4_trace.log.15
Version: Version 5.0.41
***EOM***

I attached the ndb_4_trace.log.15 file to this ticket.
[23 Jan 2008 13:55] Yann Le Rouzic
Trace file for error 2303

Attachment: ndb_4_trace.log.15.gz (application/gzip, text), 125.65 KiB.

[23 Jan 2008 14:01] Jonas Oreland
sh> perror --ndb 827
NDB error code 827: Out of memory in Ndb Kernel, table data (increase DataMemory): Permanent error: Insufficient space

I.e you have also changed DataMemory somehow in a incompatible way...

/jonas
[23 Jan 2008 14:21] Yann Le Rouzic
Since I changed the config.ini file, I guess that I have to run "ndbd --initial" on the node that failed. Is there a risk of corrupting the data on the other node?
[23 Jan 2008 16:30] Jonas Oreland
no, that should be fine

/jonas
[24 Jan 2008 14:12] Yann Le Rouzic
Trace file after "ndbd --initial"

Attachment: ndb_4_trace.log.17.gz (application/gzip, text), 125.28 KiB.

[24 Jan 2008 14:13] Yann Le Rouzic
Configs are now exactly the same on both servers, but after running "ndbd --initial" I still get the same error:

Time: Thursday 24 January 2008 - 15:07:56
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 3 as copyfrag failed, error: 827
Error object: NDBCNTR (Line: 196) 0x0000000e
Program: /usr/local/mysql/5.0.41/bin/ndbd
Pid: 8515
Trace: /data/mysql/5.0.41/data/ndb_4_trace.log.17
Version: Version 5.0.41
***EOM***

Trace file is provided
[3 Feb 2008 21:03] Jonas Oreland
Hi,

hmm...i would recommend increasing datamemory and maxnooftables

/Jonas
[15 Nov 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[25 May 2012 9:00] Gustaf Thorslund
Looks like configuration error so !bug
Since 5.0 is kind of history also unsupported now (but wasn't when bug was opened)