MySQL Bugs: #41170: ndbd nodes can not start with two ndb

Bug #41170	ndbd nodes can not start with two ndb_mgmd nodes
Submitted:	2 Dec 2008 11:51	Modified:	12 Feb 2009 14:39
Reporter:	Wen Xiong	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	mysql-5.1-telco-6.4	OS:	Solaris
Assigned to:		CPU Architecture:	Any

Description:
The ndbd nodes can not start with two ndb_mgmd nodes.

Since I have started two ndb_mgmd nodes and tried to start the two ndbd nodes, one of which uses ndb_mgmd node 1 and the other from ndb_mgmd 2. Then the first ndbd nodes I am trying to start fails in the start phase 5, which says in the ndb_1_cluster.log as the following:

2008-12-02 09:30:28 [MgmSrvr] ALERT    -- Node 3: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 11. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

In the error log, it says that:
Time: Tuesday 2 December 2008 - 09:30:27
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing
error message, please report a bug)
Error: 6000
Error data: Signal 11 received; Segmentation Fault
Error object: main.cpp
Program: ./libexec/ndbd
Pid: 10386
Trace: /export/home/tmp/wx228566/ndbd/ndb_3_trace.log.6
Version: mysql-5.1.29 ndb-6.4.0-alpha

Then the other ndbd fails as well.

2008-12-02 09:30:28 [MgmSrvr] ALERT    -- Node 4: Forced node shutdown completed. Occured during startphase 5. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.

Time: Tuesday 2 December 2008 - 09:30:28
Status: Temporary error, restart node
Message: Another node failed during system restart, please investigate error(s)
on other node(s) (Restart error)
Error: 2308
Error data: Node 3 disconnected
Error object: QMGR (Line: 2867) 0x0000000e
Program: ./libexec/ndbd
Pid: 9464
Trace: /export/home/tmp/wx228566/ndbd/ndb_4_trace.log.6
Version: mysql-5.1.29 ndb-6.4.0-alpha

How to repeat:
This is the config file I use to start the cluster.
[NDBD DEFAULT]
NoOfReplicas= 2
DataMemory= 600G
IndexMemory= 100M
BackupMemory= 64M
DataDir=/export/home/tmp/wx228566/ndbd
FileSystemPath=/export/home/tmp/wx228566/ndbd

[MGM DEFAULT]
PortNumber=1186
DataDir=/export/home/tmp/wx228566/ndb_mgmd.2

# Management node
[NDB_MGMD]
Id= 1
HostName= nanna13
ArbitrationRank=1

[NDB_MGMD]
Id= 2
HostName= nanna14
ArbitrationRank=1

# Data Nodes
[NDBD]
Id= 3
HostName= nanna15

[NDBD]
Id= 4
HostName= nanna16

[MYSQLD]
HostName= nanna14

[MYSQLD]

[MYSQLD]

trace file for ndbd nodes

Attachment: trace_file.tar.gz (application/x-gzip, text), 96.76 KiB.

I don't see a connectstring in your configuration.

One of the following two things must be done:

1. In the [ndbd_default] section of the config.ini file:

connect-string=nanna13,nanna14

(or you can specify it in each [ndbd] section)

OR

2. Include the option --connect-string=nanna13,nanna14 on the command line when starting ndbd.

If you didn't do this when starting the cluster, please repeat your test using the connectstring. Otherwise, please show the options you used when starting the Cluster ndbd and ndb_mgmd executables. Thanks!

The command I start ndb_mgmd on nanna13 and nanna14: 
# config.ini.1 and config.ini.2 are the identical copies.
 ./libexec/ndb_mgmd -f /export/home/tmp/wx228566/ndb_mgmd.2/config.ini.1 --datadir=/export/home/tmp/wx228566/ndb_mgmd.2

 ./libexec/ndb_mgmd -f /export/home/tmp/wx228566/ndb_mgmd.2/config.ini.2 --datadir=/export/home/tmp/wx228566/ndb_mgmd.2

The command to start ndbd:
 ./libexec/ndbd --ndb-connectstring="nanna13:1186" --initial

 ./libexec/ndbd --ndb-connectstring="nanna14:1186" --initial

*Both* management servers must be referenced in the connectstrings used for *all* data nodes in the cluster.

Please repeat using 

./libexec/ndbd --ndb-connectstring=nanna13,nanna14 --initial 

for both data nodes and see if the cluster still fails to start.

(1186 is the default port, and shouldn't be necessary. IIRC, the quotes aren't needed, either.)

I have tried once again using
./libexec/ndbd --ndb-connectstring=nanna15,nanna16
for both ndbd nodes.
Unfortunately, it does not work as well with the same error message I got before.

Sorry, write something wrong, it is 
./libexec/ndbd --ndb-connectstring=nanna13,nanna14

I tried once again with ndbd nodes on nanna13 and nanna14, then it is working.

After that, I start the cluster with the original config file (ndbd nodes on nanna15 and nanna16), it is working.

Thank you for the feedback.

Closing as "Not a Bug" according to the last comment.

The relevant info is in http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-connectstring.html - does there seem to be anything misleading/confusing or missing here?