Bug #27750 NDB Cluster 5.1.15 crashes with internal error pointing to ndbrequire
Submitted: 11 Apr 2007 7:01 Modified: 14 Jun 2007 6:42
Reporter: Shouvik Basu Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1.15 OS:Linux (RHEL 4 AS Upd 4 (64bit))
Assigned to: CPU Architecture:Any

[11 Apr 2007 7:01] Shouvik Basu
Description:
We are running a 4 node cluster with NDB 5.1.15.  The schema is attached. The cluster crashes with internal error. We had around 50 million record in the tables at the time the cluster crashed.

Logs and traces from all nodes are attached. Excerpt is mentioned below.

NDB error log (NDB node 1) extract during abrupt cluster shutdown:
==================================================================
Current byte-offset of file-pointer is: 1067

Time: Tuesday 10 April 2007 - 07:14:26
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbtup/DbtupRoutines.cpp
Error object: DBTUP (Line: 477) 0x0000000a
Program: ndbd
Pid: 29669
Trace: /mysql/setup2/ndb1/ndb_5_trace.log.1
Version: Version 5.1.15 (beta)
***EOM***

NDB error log (NDB node 1) extract when trying to restart NDB nodes WITHOUT --initial (i.e. normal restart):
============================================================================================================

Time: Tuesday 10 April 2007 - 07:51:23
Status: Temporary error, restart node
Message: Conflict when selecting restart type (Internal error, programming error or missing error message, please report a bug)
Error: 2311
Error data: Unable to start missing node group!  starting: 0000000000000020 (missing fs for: 0000000000000000)
Error object: QMGR (Line: 1408) 0x0000000a
Program: ndbd
Pid: 4599
Trace: /mysql/setup2/ndb1/ndb_5_trace.log.2
Version: Version 5.1.15 (beta)
***EOM***

How to repeat:
Non producable.
[11 Apr 2007 7:02] Shouvik Basu
Configuration of each NDB Node

Attachment: node_config.txt (text/plain), 5.92 KiB.

[11 Apr 2007 7:04] Shouvik Basu
Node 1 Error Log and conf files

Attachment: logs_node1.tar.gz (application/x-gzip, text), 142.22 KiB.

[11 Apr 2007 7:04] Shouvik Basu
Node 2 Error Log and conf files

Attachment: logs_node2.tar.gz (application/x-gzip, text), 192.62 KiB.

[11 Apr 2007 7:05] Shouvik Basu
Node 3 Error Log and conf files

Attachment: logs_node3.tar.gz (application/x-gzip, text), 147.72 KiB.

[11 Apr 2007 7:05] Shouvik Basu
Node 4 Error Log and conf files

Attachment: logs_node4.tar.gz (application/x-gzip, text), 147.14 KiB.

[11 Apr 2007 7:19] Jonas Oreland
Hi,

Do you think http://bugs.mysql.com/bug.php?id=27512 match you description?

/Jonas
[11 Apr 2007 7:50] Shouvik Basu
We could not see any mention of "inconsistent tuple" error in the logfiles, or tracefiles. If Bug 27512 manifests itself as "ndbrequire" error, then we do not know. The description of Bug 27512 is not clear in this area.

But as it appears from the description, it is not related current issue.
[11 Apr 2007 21:14] Tomas Ulin
How large is your DataMemory?  If it is > 16GB, you will need the bugfix Jonas is mentioning.

BR,

Tomas
[13 Apr 2007 6:20] Shouvik Basu
Yes, Data Memory we have used is 20G.
You may also see other details about the configuration in the attched gz files
[13 Apr 2007 7:03] Tomas Ulin
Hi,

then our best bet is that this is a duplicate bug with the one Jonas mentions (90% sure of this).  Unfortunately the bugfix is not available until 5.1.18.

BR,

Tomas
[14 May 2007 6:42] Tomas Ulin
5.1.18 is on it's way out, including the fix for bug 27512.

Setting the bug report in need feedback, awaiting retesting with 5.1.18, to see if this is still reproducable after that bug has been fixed.

BR,

Tomas
[14 Jun 2007 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".