MySQL Bugs: #23768: MySQL Cluster 5.0.26 - Nodes failing to start at all

Bug #23768	MySQL Cluster 5.0.26 - Nodes failing to start at all - DbDict problem ?
Submitted:	30 Oct 2006 11:49	Modified:	29 Feb 2008 9:37
Reporter:	Cameron Logie	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	5.0.27, 5.0.26	OS:	Any (64bit Gentoo AMD64)
Assigned to:		CPU Architecture:	Any
Tags:	dbdict cluster 2341 2308 initial

Description:
Hi Folks,

I'm having real trouble with a 1 NDBD /4 NDBD / 4 SQL Node setup.
Networking and DNS are fine between all the boxes and firewalling is not an issue.

With ndb_mgmd running and mysqld not running, when I try and do ndbd --initial on each node, it always gets to phase 5 and then dies with the following error;

2006-10-30 11:23:21 [MgmSrvr] ALERT    -- Node 2: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error
2006-10-30 11:23:22 [MgmSrvr] INFO     -- Node 1: Node 5 Connected
2006-10-30 11:23:22 [MgmSrvr] ALERT    -- Node 5: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.
2006-10-30 11:23:22 [MgmSrvr] INFO     -- Node 1: Node 3 Connected
2006-10-30 11:23:22 [MgmSrvr] INFO     -- Node 1: Node 4 Connected
2006-10-30 11:23:22 [MgmSrvr] ALERT    -- Node 3: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.
2006-10-30 11:23:22 [MgmSrvr] ALERT    -- Node 4: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.

I've made sure that each /var/lib/mysql-cluster directory is empty on each node (apart from the mgmd node obviously).

I've tracked down the node error log and the master node says this;

Current byte-offset of file-pointer is: 568

Time: Monday 30 October 2006 - 11:23:21
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: Dbdict.cpp
Error object: DBDICT (Line: 4132) 0x0000000e
Program: ndbd
Pid: 21996
Trace: /var/lib/mysql-cluster/ndb_2_trace.log.1
Version: Version 5.0.26
***EOM***

The other nodes are just complaining that 'another node failed during startup'.

My config file is really simple as so;

[NDBD DEFAULT]
NoOfReplicas=2
#IndexMemory=32M
#DataMemory=512M
MaxNoOfAttributes=2000

[TCP DEFAULT]
portnumber=2202

[NDB_MGMD]
hostname=10.0.1.101
datadir=/var/lib/mysql-cluster
id=1

[NDBD]
hostname=10.0.1.1
datadir=/var/lib/mysql-cluster
id=2

[NDBD]
hostname=10.0.1.2
datadir=/var/lib/mysql-cluster
id=3

[NDBD]
hostname=10.0.1.3
datadir=/var/lib/mysql-cluster
id=4

[NDBD]
hostname=10.0.1.4
datadir=/var/lib/mysql-cluster
id=5

[MYSQLD]
id=7

[MYSQLD]
id=8

[MYSQLD]
id=9

[MYSQLD]
id=10

[API]
id=11

I can supply traces if you want.

Regards,
Cameron.

How to repeat:
Follow the steps as above.

Thank you for a problem report. Please, try to repeat with a newer version, 5.0.27, and inform abouty the results.

Tried it with 5.0.27 on the ndbd nodes and it exhibits the same DbDict.cpp error.

Cammy.

Hi

The tracefile looks really strange.

Are these binaries compiled by you?
If so which compiler do you use (gcc --version)

If not can you upload your schema

/Jonas

Hi Jonas,

As I'm using Gentoo, these binaries have been compiled by the Gentoo ebuild.

# gcc --version
gcc (GCC) 3.4.6 (Gentoo Hardened 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)

Do you think that hardened toolchain has anything to do with it ?
The odd thing is I had 5.0.24 working on this system before without this hassle.

That'll teach me for wanting to upgrade. :)

Regards,
Cammy.

Does anyone have any thoughts on this ?

Cammy.

Please, try to install MySQL binaries (mysql-max-5.0.27-linux-x86_64-glibc23.tar.gz) and inform about the results.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

I am having this issue on a 5.0.44 gentoo system (also running a hardened version of gcc):
# gcc --version
gcc (GCC) 3.4.4 (Gentoo Hardened 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)

I was able to start a cluster using 5.0.44 on a very similar machine with gcc (GCC) 4.1.2 (Gentoo 4.1.2).

So it appears this issue relates to using a hardened version of gcc.

Is anybody able to repeat this with recent 5.0.x versions and gcc other than Gentoo Hardened?

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".