MySQL Bugs: #43068: data node fails on startup with error 2308: pointer too large

Bug #43068	data node fails on startup with error 2308: pointer too large
Submitted:	20 Feb 2009 21:47	Modified:	8 Apr 2009 8:08
Reporter:	Robin McMillon	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	6.3.20	OS:	Solaris (10 x86)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
This bug is related to Bug 38871 (http://bugs.mysql.com/bug.php?id=38871)

Cluster is comprised of: 6 Sun x4150
Each is running:
OS: OpenSolaris x86
xVM w/ one domain - cluster is the only app running on these machines
MySQL Cluster: 6.3.20 (MySQL built tarball)

Cluster setup:
2 x4150 as management and SQL nodes (Nodes 1 & 2: management, Nodes 7 & 8: SQL)
4 x4150 as data nodes (Nodes 3, 4, 5, 6)

Problem
=====
On startup, node 5 fails with the following error:

Time: Thursday 19 February 2009 - 15:01:02
Status: Temporary error, restart node
Message: Pointer too large (Internal error, programming error or missing error message, please report a bug)
Error: 2306
Error data: dbdih/DbdihMain.cpp
Error object: DBDIH (Line: 14764) 0x0000000a
Program: ndbd
Pid: 9806
Trace: /z1/mysql/DATA/6.3-cluster/ndb_5_trace.log.3
Version: mysql-5.1.30 ndb-6.3.20-GA
***EOM***

Nothing had changed in Node 5's configuration between when the last successful restart and this failure so I attempted to restart the cluster - no luck: same error.

I then checked the MySQL bug system.  This appears to be a problem that has been reported in various versions since 4.1.x with no resolution.  Bug 38871 claimed that restarting his failing node with --initial fixed the problem so I attempted that and it worked.

Workaround: --initial restart of Node 3 allows the startup.

How to repeat:
Relevant trace file is attached.

Suggested fix:
?

first time this error came up

Attachment: ndb_5_trace.log.2.gz (application/x-gzip, text), 32.74 KiB.

tried to restart, got same error (before tried --initial)

Attachment: ndb_5_trace.log.3.gz (application/x-gzip, text), 46.19 KiB.

Hi,

To proceed on this, we need the cluster log and config.ini
(error log wouldnt either be bad)

From what I can see in trace-file, there doesnt seem to be a bug,
but rather that nodes has been restarted in a sequence that made it possible
to come up.

Setting status to need feedback

/Jonas

I've uploaded the error log for node 5 and the config.ini for the cluster.  Unfortunately the cluster log has aged out.  If I see it again, I will add *all* the relevant files.

Also, "Workaround: --initial restart of Node 3 allows the startup." should have said Node 5 instead.  When you see the cluster come back up, that was after I performed the 'ndbd --initial' on Node 5.

setting back to waiting on feedback,
waiting for more logs if problem occurs again

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".