Bug #48991 Forced node shutdown
Submitted: 23 Nov 2009 13:14 Modified: 11 Jan 2010 15:15
Reporter: sarang. s Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1.37 ndb-7.0.8 OS:Linux (Red Hat Enterprise Linux Server release 5.2 )
Assigned to: CPU Architecture:Any

[23 Nov 2009 13:14] sarang. s
Description:
I have a 3 server setup, 192.168.1.109 acting as management node and the 192.168.1.107 and 192.168.1.108 acting as mysqld+ndbd node.

I started my management node, and on my mysqld+ndbd node i did the following :

[root@localhost ~]# ndbd --initial
2009-11-23 12:59:11 [ndbd] INFO     -- Configuration fetched from '192.168.1.109:1186', generation: 1
[root@localhost ~]# service mysql start

However immediately after this the management node shows the following message :

ndb_mgm > Node:2 Forced node shutdown completed. Initiated by signal 11

Logs on my mysqld+ ndbd node shows :

2009-11-23 12:59:11 [ndbd] INFO     -- NDB Cluster -- DB node 2
2009-11-23 12:59:11 [ndbd] INFO     -- mysql-5.1.37 ndb-7.0.8a --
2009-11-23 12:59:11 [ndbd] INFO     -- WatchDog timer is set to 6000 ms
2009-11-23 12:59:11 [ndbd] INFO     -- Ndbd_mem_manager::init(1) min: 4004Mb initial: 4024Mb
2009-11-23 12:59:11 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 11.

The ndb_1_cluster.log shows

2009-11-23 12:44:03 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 1 reserved for ip 192.168.1.109, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2009-11-23 12:44:03 [MgmtSrvr] INFO     -- Node 1: Node 1 Connected
2009-11-23 12:44:03 [MgmtSrvr] INFO     -- Id: 1, Command port: *:1186
2009-11-23 12:47:44 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 2 reserved for ip 192.168.1.107, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000006.
2009-11-23 12:47:44 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 2 freed, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2009-11-23 12:47:44 [MgmtSrvr] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 11.
2009-11-23 12:47:57 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 4 reserved for ip 192.168.1.107, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000012.
2009-11-23 12:47:57 [MgmtSrvr] INFO     -- Node 4: mysqld --server-id=0

192.168.1.107 is the IP of my mysqld+ndbd node, and 109 is the management node IP.

How to repeat:
I have a 3 server setup, 192.168.1.109 acting as management node and the 192.168.1.107 and 192.168.1.108 acting as mysqld+ndbd node.

I started my management node, and on my mysqld+ndbd node i did the following :

[root@localhost ~]# ndbd --initial
2009-11-23 12:59:11 [ndbd] INFO     -- Configuration fetched from '192.168.1.109:1186', generation: 1
[root@localhost ~]# service mysql start

However immediately after this the management node shows the following message :

ndb_mgm > Node:2 Forced node shutdown completed. Initiated by signal 11

Logs on my mysqld+ ndbd node shows :

2009-11-23 12:59:11 [ndbd] INFO     -- NDB Cluster -- DB node 2
2009-11-23 12:59:11 [ndbd] INFO     -- mysql-5.1.37 ndb-7.0.8a --
2009-11-23 12:59:11 [ndbd] INFO     -- WatchDog timer is set to 6000 ms
2009-11-23 12:59:11 [ndbd] INFO     -- Ndbd_mem_manager::init(1) min: 4004Mb initial: 4024Mb
2009-11-23 12:59:11 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 11.

The ndb_1_cluster.log shows

2009-11-23 12:44:03 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 1 reserved for ip 192.168.1.109, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2009-11-23 12:44:03 [MgmtSrvr] INFO     -- Node 1: Node 1 Connected
2009-11-23 12:44:03 [MgmtSrvr] INFO     -- Id: 1, Command port: *:1186
2009-11-23 12:47:44 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 2 reserved for ip 192.168.1.107, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000006.
2009-11-23 12:47:44 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 2 freed, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
2009-11-23 12:47:44 [MgmtSrvr] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 11.
2009-11-23 12:47:57 [MgmtSrvr] INFO     -- Mgmt server state: nodeid 4 reserved for ip 192.168.1.107, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000012.
2009-11-23 12:47:57 [MgmtSrvr] INFO     -- Node 4: mysqld --server-id=0

192.168.1.107 is the IP of my mysqld+ndbd node, and 109 is the management node IP.
[23 Nov 2009 13:28] sarang. s
Uploaded the output of ndbd_error_reporter to the ftp server here. Name of the log file : 48991_ndbd_error_reporter
[23 Nov 2009 13:44] Andrew Hutchings
Are there any other log files for node 2?  Such as ndb_2_error.log and ndb_2_trace*.  These are what we would normally use to diagnose these issues (along with the rest of the data in the upload).

If not we will really need a core file to diagnose this.
[24 Nov 2009 12:01] sarang. s
Uploaded the log file from the second data node server to the ftp server, name is : 48991_datanode2.tar
[25 Nov 2009 0:08] Andrew Hutchings
Unfortunately all this tells us is there is something wrong with the
other node (and connection problems with the management node).

We would need the error and trace files from nodeID 2 or the core file
to diagnose this.

To get a core file in RHEL (as you indicated you used off the bug tracker) please follow the instructions outlined at:

http://kbase.redhat.com/faq/docs/DOC-5353
[30 Nov 2009 7:44] sarang. s
core dump

Attachment: tmp.tar.gz (application/x-gzip, text), 538 bytes.

[11 Dec 2009 15:15] Andrew Hutchings
The file you uploaded is not a core file.  Please use the instructions provided to you offline to generate the core file.
[12 Jan 2010 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".