MySQL Bugs: #33900: One NDB node is down and can not restart

Bug #33900	One NDB node is down and can not restart
Submitted:	17 Jan 2008 16:44	Modified:	25 May 2012 9:00
Reporter:	Yann Le Rouzic	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	5.0.41	OS:	Linux (RHEL 4 Update 5)
Assigned to:	Jonas Oreland	CPU Architecture:	Any
Tags:	Cluster NDB 2341

Description:
One NDB node in my cluster has shutdown and can not be restarted. It keeps generating this error:

Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug
)
Error: 2341
Error data: Dbdict.cpp
Error object: DBDICT (Line: 2612) 0x0000000a
Program: /usr/local/mysql/5.0.41/bin/ndbd
Pid: 17817
Trace: /data/mysql/5.0.41/data/ndb_4_trace.log.14
Version: Version 5.0.41
***EOM***

The other node is up, no data problem. Trace and log files can be provided if needed.

How to repeat:
I just have to restart the ndbd process to generate the error again.

Suggested fix:
ndbd --initial ?

it looks like a corrupted table file,
most likely "ndbd --initial" will do the trick,
and I would save the filesystem *first*,

also, exactly what is wrong is impossible to tell wo/ the tracefile

/jonas

Any idea about the cause of this issue, using the log files?

Hi,

yes, the starting node fails due as it can not 
  allocate an Attribute (SQL Column) (config MaxNoOfAttributes)

So my guess would be that you have done a config change of this variable
  "recently", where you updated ndb_mgmd, but have not tried to restart
  cluster.

I recommend increasing this value.
If you have *not* modified this value "recently", then it's a bug somewhere.

Let me know if this helps

/jonas

Thanks for your answer Jonas. Indeed, we have discovered that the config.ini file was not the same on both servers. We corrected it, but we encounter another error when restarting ndbd:

Time: Wednesday 23 January 2008 - 14:48:23
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 3 as copyfrag failed, error: 827
Error object: NDBCNTR (Line: 196) 0x0000000a
Program: /usr/local/mysql/5.0.41/bin/ndbd
Pid: 9787
Trace: /data/mysql/5.0.41/data/ndb_4_trace.log.15
Version: Version 5.0.41
***EOM***

I attached the ndb_4_trace.log.15 file to this ticket.

Trace file for error 2303

Attachment: ndb_4_trace.log.15.gz (application/gzip, text), 125.65 KiB.

sh> perror --ndb 827
NDB error code 827: Out of memory in Ndb Kernel, table data (increase DataMemory): Permanent error: Insufficient space

I.e you have also changed DataMemory somehow in a incompatible way...

/jonas

Since I changed the config.ini file, I guess that I have to run "ndbd --initial" on the node that failed. Is there a risk of corrupting the data on the other node?

no, that should be fine

/jonas

Trace file after "ndbd --initial"

Attachment: ndb_4_trace.log.17.gz (application/gzip, text), 125.28 KiB.

Configs are now exactly the same on both servers, but after running "ndbd --initial" I still get the same error:

Time: Thursday 24 January 2008 - 15:07:56
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 3 as copyfrag failed, error: 827
Error object: NDBCNTR (Line: 196) 0x0000000e
Program: /usr/local/mysql/5.0.41/bin/ndbd
Pid: 8515
Trace: /data/mysql/5.0.41/data/ndb_4_trace.log.17
Version: Version 5.0.41
***EOM***

Trace file is provided

Hi,

hmm...i would recommend increasing datamemory and maxnooftables

/Jonas

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Looks like configuration error so !bug
Since 5.0 is kind of history also unsupported now (but wasn't when bug was opened)