Bug #23121 Node 3: Forced node shutdown completed. Occured during startphase 5.
Submitted: 10 Oct 2006 4:28 Modified: 10 Nov 2006 6:42
Reporter: Wing Lap Leung Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:MySQL 5.1.11 Beta OS:Linux (Fedora 5 - 2.6.17)
Assigned to: CPU Architecture:Any
Tags: cluster, MySQL Cluster, node shutdown

[10 Oct 2006 4:28] Wing Lap Leung
Description:
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @192.168.0.10  (Version: 5.1.11, Nodegroup: 0, Master)
id=3    @192.168.0.11  (Version: 5.1.11, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.168.0.12  (Version: 5.1.11)   # This is Virtual machine Fedora 5

[mysqld(API)]   2 node(s)
id=4    @192.168.0.10  (Version: 5.1.11)   # Physical existence
id=5    @192.168.0.11  (Version: 5.1.11)   # Physical existence

my.cnf (192.168.0.10, 192.168.0.11)
------
[mysqld]
ndbcluster
ndb-connectstring=192.168.0.12 #Management node IP
[mysql_cluster]
ndb-connectstring=192.168.0.12 #Management node IP

config.ini (192.168.0.12)
----------
[NDBD DEFAULT]
NoOfReplicas=2
[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]
# Managment Server
[NDB_MGMD]
HostName=192.168.0.12           #Management server's (Management node) IP
# Storage Engines
[NDBD]
HostName=192.168.0.10           #Data node 1
DataDir= /var/lib/mysql-cluster
[NDBD]
HostName=192.168.0.11           #Data node 2
DataDir=/var/lib/mysql-cluster
[MYSQLD]
[MYSQLD]

The cluster is setup with total 2 physical computer and 1 virtual machine.  All OS are Fedora 5 Kernel 2.6.17.

Scenario:
1) Shut down node 3 in ndb_mgm> 3 stop
2) Restore database in MySQL Administrator to node 2
3) Back to ndb_mgm> 3 start and hit "Enter", the following message appear:

ndb_mgm> 3 startNode 3: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error

4) After that, I do in ndb_mgm:

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @192.168.0.10  (Version: 5.1.11, Nodegroup: 0, Master)
id=3 (not connected, accepting connect from 192.168.0.11)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.168.0.12  (Version: 5.1.11)

[mysqld(API)]   2 node(s)
id=4    @192.168.0.10  (Version: 5.1.11)
id=5    @192.168.0.11  (Version: 5.1.11)

ndb_mgm> 3 start
Start failed.
*    22: Error
*        No contact with the process (dead ?).

5) I went back to node 3 (192.168.0.11), run:

/usr/local/mysql/bin/ndbd

No message appear

6) I went back again to ndb_mgm, it show:

ndb_mgm> Node 3: Started (version 5.1.11)

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @192.168.0.10  (Version: 5.1.11, Nodegroup: 0, Master)
id=3    @192.168.0.11  (Version: 5.1.11, Nodegroup: 0)  <<< It ran again!

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.168.0.12  (Version: 5.1.11)

[mysqld(API)]   2 node(s)
id=4    @192.168.0.10  (Version: 5.1.11)
id=5    @192.168.0.11  (Version: 5.1.11)

How to repeat:
Don't know

Suggested fix:
It seems that when ndbd in data node is started, management node will receive the message about this start and action will perform accordingly.  

However, it also seems that no one know when the management node will receive this message and, if novice, like me, explicily start a node, an errot would occur.
[10 Oct 2006 4:36] Wing Lap Leung
How to repeat:

1) Restored a datbase to a server which it's data node is shutdown
2) Start ndbd in that server
3) In ndb_mgm>  Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error
[10 Oct 2006 6:42] Jonas Oreland
Please upload cluster log + error/trace files
  and config.ini

/Jonas
[10 Oct 2006 11:07] Miguel Solorzano
Updating Category to Cluster.
[11 Nov 2006 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[29 Nov 2006 14:24] Lars Bo Svenningsen
Since I'm experincing the same problem, I'll try and provide additional information. I actually had it running on Fedora Core 5, but now has the problem on Debian Etch.

When I do "2 restart" from ndb_mgm the server goes down, but it doesn't become available agin unless I manually start it up. On the node the following can be seen in /var/lib/mysql-cluster/ndb_2_out.log :

2006-11-29 10:37:16 [ndbd] INFO     -- Restarting system
2006-11-29 10:37:17 [ndbd] INFO     -- Node 2: Node shutdown completed, restarting, no start.
2006-11-29 10:37:17 [ndbd] INFO     -- Ndb has terminated (pid 3223) restarting
2006-11-29 10:37:20 [ndbd] INFO     -- Angel pid: 3222 ndb pid: 9979
2006-11-29 10:37:20 [ndbd] INFO     -- NDB Cluster -- DB node 2
2006-11-29 10:37:20 [ndbd] INFO     -- Version 5.1.11 (beta) --
2006-11-29 10:37:20 [ndbd] INFO     -- Configuration fetched at ds-lvs02 port 1186
2006-11-29 10:37:20 [ndbd] INFO     -- WatchDog timer is set to 6000 ms
2006-11-29 10:37:21 [ndbd] INFO     -- Ndbd_mem_manager::init(1) min: 20Mb initial: 20Mb
WOPool::init(61, 9)
RWPool::init(82, 13)
RWPool::init(a2, 18)
RWPool::init(c2, 13)
RWPool::init(122, 17)
RWPool::init(142, 15)
WOPool::init(41, 8)
RWPool::init(e2, 12)
RWPool::init(102, 51)
WOPool::init(21, 6)
2006-11-29 10:37:22 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 11.

My configuration files looks like this
--------------------------------------------------------------
config.ini :
[NDBD DEFAULT]
NoOfReplicas= 2
DataDir= /var/lib/mysql-cluster

[NDB_MGMD]
Hostname= ds-lvs02.int.sifira.dk
DataDir= /var/lib/mysql-cluster

[NDBD]
HostName= bart.int.sifira.dk

[NDBD]
HostName= lisa.int.sifira.dk

[MYSQLD]
HostName= 192.168.1.90

[MYSQLD]
HostName= 192.168.200.105

[MYSQLD]
HostName= 192.168.1.62

[MYSQLD]
HostName= 192.168.1.12

[MYSQLD]
HostName= 192.168.1.7

[MYSQLD]
HostName= 192.168.1.91

--------------------------------------------------------------

my.cnf :

[MYSQLD]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
old_passwords=1 # Default to using old password format for
                # compatibility with mysql 3.x clients
                # (those using the mysqlclient10 compatibility package).
# Cluster settings:
ndbcluster                      # run NDB engine
ndb-connectstring=ds-lvs02      # location of MGM node

# Options for ndbd process:
[MYSQL_CLUSTER]
ndb-connectstring=ds-lvs02      # location of MGM node

[mysql.server]
user=mysql
basedir=/opt/sifira/mysql

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

ndbcluster                      # run NDB engine
ndb-connectstring=ds-lvs02      # location of MGM node

[ndbd]
connect-string=ds-lvs02
--------------------------------------------------------------