MySQL Bugs: #48991: Forced node shutdown

Bug #48991	Forced node shutdown
Submitted:	23 Nov 2009 13:14	Modified:	11 Jan 2010 15:15
Reporter:	sarang. s	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysql-5.1.37 ndb-7.0.8	OS:	Linux (Red Hat Enterprise Linux Server release 5.2 )
Assigned to:		CPU Architecture:	Any

Description:
I have a 3 server setup, 192.168.1.109 acting as management node and the 192.168.1.107 and 192.168.1.108 acting as mysqld+ndbd node.

I started my management node, and on my mysqld+ndbd node i did the following :

[root@localhost ~]# ndbd --initial
2009-11-23 12:59:11 [ndbd] INFO     -- Configuration fetched from '192.168.1.109:1186', generation: 1
[root@localhost ~]# service mysql start

However immediately after this the management node shows the following message :

ndb_mgm > Node:2 Forced node shutdown completed. Initiated by signal 11

Logs on my mysqld+ ndbd node shows :

2009-11-23 12:59:11 [ndbd] INFO     -- NDB Cluster -- DB node 2
2009-11-23 12:59:11 [ndbd] INFO     -- mysql-5.1.37 ndb-7.0.8a --
2009-11-23 12:59:11 [ndbd] INFO     -- WatchDog timer is set to 6000 ms
2009-11-23 12:59:11 [ndbd] INFO     -- Ndbd_mem_manager::init(1) min: 4004Mb initial: 4024Mb
2009-11-23 12:59:11 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 11.

The ndb_1_cluster.log shows

2009-11-23 12:44:03 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 1 reserved for ip 192.168.1.109, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2009-11-23 12:44:03 [MgmtSrvr] INFO     -- Node 1: Node 1 Connected
2009-11-23 12:44:03 [MgmtSrvr] INFO     -- Id: 1, Command port: *:1186
2009-11-23 12:47:44 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 2 reserved for ip 192.168.1.107, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000006.
2009-11-23 12:47:44 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 2 freed, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2009-11-23 12:47:44 [MgmtSrvr] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 11.
2009-11-23 12:47:57 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 4 reserved for ip 192.168.1.107, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000012.
2009-11-23 12:47:57 [MgmtSrvr] INFO     -- Node 4: mysqld --server-id=0

192.168.1.107 is the IP of my mysqld+ndbd node, and 109 is the management node IP.

How to repeat:
I have a 3 server setup, 192.168.1.109 acting as management node and the 192.168.1.107 and 192.168.1.108 acting as mysqld+ndbd node.

I started my management node, and on my mysqld+ndbd node i did the following :

[root@localhost ~]# ndbd --initial
2009-11-23 12:59:11 [ndbd] INFO     -- Configuration fetched from '192.168.1.109:1186', generation: 1
[root@localhost ~]# service mysql start

However immediately after this the management node shows the following message :

ndb_mgm > Node:2 Forced node shutdown completed. Initiated by signal 11

Logs on my mysqld+ ndbd node shows :

2009-11-23 12:59:11 [ndbd] INFO     -- NDB Cluster -- DB node 2
2009-11-23 12:59:11 [ndbd] INFO     -- mysql-5.1.37 ndb-7.0.8a --
2009-11-23 12:59:11 [ndbd] INFO     -- WatchDog timer is set to 6000 ms
2009-11-23 12:59:11 [ndbd] INFO     -- Ndbd_mem_manager::init(1) min: 4004Mb initial: 4024Mb
2009-11-23 12:59:11 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 11.

The ndb_1_cluster.log shows

2009-11-23 12:44:03 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 1 reserved for ip 192.168.1.109, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2009-11-23 12:44:03 [MgmtSrvr] INFO     -- Node 1: Node 1 Connected
2009-11-23 12:44:03 [MgmtSrvr] INFO     -- Id: 1, Command port: *:1186
2009-11-23 12:47:44 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 2 reserved for ip 192.168.1.107, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000006.
2009-11-23 12:47:44 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 2 freed, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2009-11-23 12:47:44 [MgmtSrvr] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 11.
2009-11-23 12:47:57 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 4 reserved for ip 192.168.1.107, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000012.
2009-11-23 12:47:57 [MgmtSrvr] INFO     -- Node 4: mysqld --server-id=0

192.168.1.107 is the IP of my mysqld+ndbd node, and 109 is the management node IP.

Uploaded the output of ndbd_error_reporter to the ftp server here. Name of the log file : 48991_ndbd_error_reporter

Are there any other log files for node 2?  Such as ndb_2_error.log and ndb_2_trace*.  These are what we would normally use to diagnose these issues (along with the rest of the data in the upload).

If not we will really need a core file to diagnose this.

Uploaded the log file from the second data node server to the ftp server, name is : 48991_datanode2.tar

Unfortunately all this tells us is there is something wrong with the
other node (and connection problems with the management node).

We would need the error and trace files from nodeID 2 or the core file
to diagnose this.

To get a core file in RHEL (as you indicated you used off the bug tracker) please follow the instructions outlined at:

http://kbase.redhat.com/faq/docs/DOC-5353

core dump

Attachment: tmp.tar.gz (application/x-gzip, text), 538 bytes.

The file you uploaded is not a core file.  Please use the instructions provided to you offline to generate the core file.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".