MySQL Bugs: #48232: Crash in DBDICT (Line: 4115)

Bug #48232	Crash in DBDICT (Line: 4115)
Submitted:	22 Oct 2009 14:13	Modified:	27 Oct 2009 6:43
Reporter:	Andy Lintner	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	mysql-5.1-telco-7.0	OS:	Linux (RHEL 5.4)
Assigned to:	Jonas Oreland	CPU Architecture:	Any
Tags:	7.0.8a

Description:
When restarting a data node after a GCP stop, I experienced the following error, and am now unable to start the node. The same error occurs when doing an initial start.

Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbdict/Dbdict.cpp
Error object: DBDICT (Line: 4115) 0x0000000e
Program: /usr/local/mysql//mysql/bin//ndbmtd
Pid: 2578 thr: 0
Version: mysql-5.1.37 ndb-7.0.8

How to repeat:
unknown

Trace files from the crash

Attachment: ndb_4_logs.tar.gz (application/x-gzip, text), 129.53 KiB.

config.ini

Attachment: config.ini (text/plain), 4.59 KiB.

cluster log would also be good
(note: i havent actually checked traces yet...but cluster log is always
 good to have around)

Cluster Log

Attachment: ndb_1_cluster.log (application/octet-stream, text), 110.74 KiB.

The problem seems to be that the alive node has bigger SharedGlobalMemory than
the starting node.

My guess is that you
1) Started cluster with a value for SharedGlobalMemory
2) changed the value
3) restarted this node with a lower value

Not entirely sure though, but pretty sure that setting that value
restarting the "ndb_mgmd --reload" and then start the problematic node
will make problem go away.

I restarted both management nodes, followed by the active node, and then the inactive node experienced the same fault. However, your comment on memory made me dig deeper, and I discovered an unrelated runaway process consuming memory on that server. There were only 2G available to the the node, instead of the normal 8G. Killing that process allowed the node to startup.

However, the error message was obviously less than helpful. Since your diagnosis indicated a mismatching SharedGlobalMemory, is is there anything that would have dynamically resized SharedGlobalMemory down in response to insufficient available memory? Either way, my issue is resolved, so I moved this down to Non-critical since it seems to just be an issue of error reporting.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/88181

3163 Jonas Oreland	2009-10-26
      ndb - bug#48232 - improve error reporting when failure to recreate/drop object during restore of schema

Added informative error message
Pushed to 7.0.9

Bugfix documented in the NDB-7.0.9 changelog as follows:

        When a data node failed to start due to inability to recreate or
        drop objects during schema restoration (for example:
        insufficient memory was available to the data node process on
        account of issues not directly related to MySQL Cluster on the
        host machine), the reason for the failure was not provided. Now
        is such cases, a more informative error message is logged.

Closed.