MySQL Bugs: #23121: Node 3: Forced node shutdown completed. Occured during startphase 5.

Bug #23121	Node 3: Forced node shutdown completed. Occured during startphase 5.
Submitted:	10 Oct 2006 4:28	Modified:	10 Nov 2006 6:42
Reporter:	Wing Lap Leung	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	MySQL 5.1.11 Beta	OS:	Linux (Fedora 5 - 2.6.17)
Assigned to:		CPU Architecture:	Any
Tags:	cluster, MySQL Cluster, node shutdown

Description:
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @192.168.0.10  (Version: 5.1.11, Nodegroup: 0, Master)
id=3    @192.168.0.11  (Version: 5.1.11, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.168.0.12  (Version: 5.1.11)   # This is Virtual machine Fedora 5

[mysqld(API)]   2 node(s)
id=4    @192.168.0.10  (Version: 5.1.11)   # Physical existence
id=5    @192.168.0.11  (Version: 5.1.11)   # Physical existence

my.cnf (192.168.0.10, 192.168.0.11)
------
[mysqld]
ndbcluster
ndb-connectstring=192.168.0.12 #Management node IP
[mysql_cluster]
ndb-connectstring=192.168.0.12 #Management node IP

config.ini (192.168.0.12)
----------
[NDBD DEFAULT]
NoOfReplicas=2
[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]
# Managment Server
[NDB_MGMD]
HostName=192.168.0.12           #Management server's (Management node) IP
# Storage Engines
[NDBD]
HostName=192.168.0.10           #Data node 1
DataDir= /var/lib/mysql-cluster
[NDBD]
HostName=192.168.0.11           #Data node 2
DataDir=/var/lib/mysql-cluster
[MYSQLD]
[MYSQLD]

The cluster is setup with total 2 physical computer and 1 virtual machine.  All OS are Fedora 5 Kernel 2.6.17.

Scenario:
1) Shut down node 3 in ndb_mgm> 3 stop
2) Restore database in MySQL Administrator to node 2
3) Back to ndb_mgm> 3 start and hit "Enter", the following message appear:

ndb_mgm> 3 startNode 3: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error

4) After that, I do in ndb_mgm:

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @192.168.0.10  (Version: 5.1.11, Nodegroup: 0, Master)
id=3 (not connected, accepting connect from 192.168.0.11)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.168.0.12  (Version: 5.1.11)

[mysqld(API)]   2 node(s)
id=4    @192.168.0.10  (Version: 5.1.11)
id=5    @192.168.0.11  (Version: 5.1.11)

ndb_mgm> 3 start
Start failed.
*    22: Error
*        No contact with the process (dead ?).

5) I went back to node 3 (192.168.0.11), run:

/usr/local/mysql/bin/ndbd

No message appear

6) I went back again to ndb_mgm, it show:

ndb_mgm> Node 3: Started (version 5.1.11)

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @192.168.0.10  (Version: 5.1.11, Nodegroup: 0, Master)
id=3    @192.168.0.11  (Version: 5.1.11, Nodegroup: 0)  <<< It ran again!

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.168.0.12  (Version: 5.1.11)

[mysqld(API)]   2 node(s)
id=4    @192.168.0.10  (Version: 5.1.11)
id=5    @192.168.0.11  (Version: 5.1.11)

How to repeat:
Don't know

Suggested fix:
It seems that when ndbd in data node is started, management node will receive the message about this start and action will perform accordingly.  

However, it also seems that no one know when the management node will receive this message and, if novice, like me, explicily start a node, an errot would occur.

How to repeat:

1) Restored a datbase to a server which it's data node is shutdown
2) Start ndbd in that server
3) In ndb_mgm>  Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error

Please upload cluster log + error/trace files
  and config.ini

/Jonas

Updating Category to Cluster.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Since I'm experincing the same problem, I'll try and provide additional information. I actually had it running on Fedora Core 5, but now has the problem on Debian Etch.

When I do "2 restart" from ndb_mgm the server goes down, but it doesn't become available agin unless I manually start it up. On the node the following can be seen in /var/lib/mysql-cluster/ndb_2_out.log :

2006-11-29 10:37:16 [ndbd] INFO     -- Restarting system
2006-11-29 10:37:17 [ndbd] INFO     -- Node 2: Node shutdown completed, restarting, no start.
2006-11-29 10:37:17 [ndbd] INFO     -- Ndb has terminated (pid 3223) restarting
2006-11-29 10:37:20 [ndbd] INFO     -- Angel pid: 3222 ndb pid: 9979
2006-11-29 10:37:20 [ndbd] INFO     -- NDB Cluster -- DB node 2
2006-11-29 10:37:20 [ndbd] INFO     -- Version 5.1.11 (beta) --
2006-11-29 10:37:20 [ndbd] INFO     -- Configuration fetched at ds-lvs02 port 1186
2006-11-29 10:37:20 [ndbd] INFO     -- WatchDog timer is set to 6000 ms
2006-11-29 10:37:21 [ndbd] INFO     -- Ndbd_mem_manager::init(1) min: 20Mb initial: 20Mb
WOPool::init(61, 9)
RWPool::init(82, 13)
RWPool::init(a2, 18)
RWPool::init(c2, 13)
RWPool::init(122, 17)
RWPool::init(142, 15)
WOPool::init(41, 8)
RWPool::init(e2, 12)
RWPool::init(102, 51)
WOPool::init(21, 6)
2006-11-29 10:37:22 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 11.

My configuration files looks like this
--------------------------------------------------------------
config.ini :
[NDBD DEFAULT]
NoOfReplicas= 2
DataDir= /var/lib/mysql-cluster

[NDB_MGMD]
Hostname= ds-lvs02.int.sifira.dk
DataDir= /var/lib/mysql-cluster

[NDBD]
HostName= bart.int.sifira.dk

[NDBD]
HostName= lisa.int.sifira.dk

[MYSQLD]
HostName= 192.168.1.90

[MYSQLD]
HostName= 192.168.200.105

[MYSQLD]
HostName= 192.168.1.62

[MYSQLD]
HostName= 192.168.1.12

[MYSQLD]
HostName= 192.168.1.7

[MYSQLD]
HostName= 192.168.1.91

--------------------------------------------------------------

my.cnf :

[MYSQLD]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
old_passwords=1 # Default to using old password format for
                # compatibility with mysql 3.x clients
                # (those using the mysqlclient10 compatibility package).
# Cluster settings:
ndbcluster                      # run NDB engine
ndb-connectstring=ds-lvs02      # location of MGM node

# Options for ndbd process:
[MYSQL_CLUSTER]
ndb-connectstring=ds-lvs02      # location of MGM node

[mysql.server]
user=mysql
basedir=/opt/sifira/mysql

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

ndbcluster                      # run NDB engine
ndb-connectstring=ds-lvs02      # location of MGM node

[ndbd]
connect-string=ds-lvs02
--------------------------------------------------------------