Bug #23768 MySQL Cluster 5.0.26 - Nodes failing to start at all - DbDict problem ?
Submitted: 30 Oct 2006 11:49 Modified: 29 Feb 2008 9:37
Reporter: Cameron Logie Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.0.27, 5.0.26 OS:Any (64bit Gentoo AMD64)
Assigned to: CPU Architecture:Any
Tags: dbdict cluster 2341 2308 initial

[30 Oct 2006 11:49] Cameron Logie
Description:
Hi Folks,

I'm having real trouble with a 1 NDBD /4 NDBD / 4 SQL Node setup.
Networking and DNS are fine between all the boxes and firewalling is not an issue.

With ndb_mgmd running and mysqld not running, when I try and do ndbd --initial on each node, it always gets to phase 5 and then dies with the following error;

2006-10-30 11:23:21 [MgmSrvr] ALERT    -- Node 2: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error
2006-10-30 11:23:22 [MgmSrvr] INFO     -- Node 1: Node 5 Connected
2006-10-30 11:23:22 [MgmSrvr] ALERT    -- Node 5: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.
2006-10-30 11:23:22 [MgmSrvr] INFO     -- Node 1: Node 3 Connected
2006-10-30 11:23:22 [MgmSrvr] INFO     -- Node 1: Node 4 Connected
2006-10-30 11:23:22 [MgmSrvr] ALERT    -- Node 3: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.
2006-10-30 11:23:22 [MgmSrvr] ALERT    -- Node 4: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.

I've made sure that each /var/lib/mysql-cluster directory is empty on each node (apart from the mgmd node obviously).

I've tracked down the node error log and the master node says this;

Current byte-offset of file-pointer is: 568

Time: Monday 30 October 2006 - 11:23:21
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: Dbdict.cpp
Error object: DBDICT (Line: 4132) 0x0000000e
Program: ndbd
Pid: 21996
Trace: /var/lib/mysql-cluster/ndb_2_trace.log.1
Version: Version 5.0.26
***EOM***

The other nodes are just complaining that 'another node failed during startup'.

My config file is really simple as so;

[NDBD DEFAULT]
NoOfReplicas=2
#IndexMemory=32M
#DataMemory=512M
MaxNoOfAttributes=2000

[TCP DEFAULT]
portnumber=2202

[NDB_MGMD]
hostname=10.0.1.101
datadir=/var/lib/mysql-cluster
id=1

[NDBD]
hostname=10.0.1.1
datadir=/var/lib/mysql-cluster
id=2

[NDBD]
hostname=10.0.1.2
datadir=/var/lib/mysql-cluster
id=3

[NDBD]
hostname=10.0.1.3
datadir=/var/lib/mysql-cluster
id=4

[NDBD]
hostname=10.0.1.4
datadir=/var/lib/mysql-cluster
id=5

[MYSQLD]
id=7

[MYSQLD]
id=8

[MYSQLD]
id=9

[MYSQLD]
id=10

[API]
id=11

I can supply traces if you want.

Regards,
Cameron.

How to repeat:
Follow the steps as above.
[30 Oct 2006 13:27] Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newer version, 5.0.27, and inform abouty the results.
[30 Oct 2006 14:11] Cameron Logie
Tried it with 5.0.27 on the ndbd nodes and it exhibits the same DbDict.cpp error.

Cammy.
[3 Nov 2006 22:41] Jonas Oreland
Hi

The tracefile looks really strange.

Are these binaries compiled by you?
If so which compiler do you use (gcc --version)

If not can you upload your schema

/Jonas
[4 Nov 2006 15:55] Cameron Logie
Hi Jonas,

As I'm using Gentoo, these binaries have been compiled by the Gentoo ebuild.

# gcc --version
gcc (GCC) 3.4.6 (Gentoo Hardened 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)

Do you think that hardened toolchain has anything to do with it ?
The odd thing is I had 5.0.24 working on this system before without this hassle.

That'll teach me for wanting to upgrade. :)

Regards,
Cammy.
[21 Nov 2006 12:26] Cameron Logie
Does anyone have any thoughts on this ?

Cammy.
[30 Dec 2006 7:57] Valeriy Kravchuk
Please, try to install MySQL binaries (mysql-max-5.0.27-linux-x86_64-glibc23.tar.gz) and inform about the results.
[31 Jan 2007 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[13 Sep 2007 17:31] Sarah M
I am having this issue on a 5.0.44 gentoo system (also running a hardened version of gcc):
# gcc --version
gcc (GCC) 3.4.4 (Gentoo Hardened 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)

I was able to start a cluster using 5.0.44 on a very similar machine with gcc (GCC) 4.1.2 (Gentoo 4.1.2).

So it appears this issue relates to using a hardened version of gcc.
[30 Jan 2008 9:37] Valeriy Kravchuk
Is anybody able to repeat this with recent 5.0.x versions and gcc other than Gentoo Hardened?
[1 Mar 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".