Description:
Running on Windows, a simple Cluster setup with 3 datanodes and 1 mgm node, restarting a data node with the nostart flag (<nodeid> RESRART -n) will cause errors, but not consistently. Sometimes it works, but mostly you get a whole bunch of errors from mgmd. Sometimes, the Cluster starts of, at the same time as errors are flowing from mgmd. If you then START the node (<nodeid> START) will cause it to start, but errors are still coming. Sometimes the data node gets stuck in "starting" phase. All sorts of problems.
How to repeat:
Set up a cluster configured as below (my setup):
<config>
[ndbd default]
NoOfReplicas=2
[mysqld default]
[ndb_mgmd default]
[tcp default]
[ndb_mgmd]
PortNumber=1186
HostName=127.0.0.1
[ndbd]
HostName=127.0.0.1
DataDir=C:/MySQL714b/node1/data
[ndbd]
HostName=127.0.0.1
DataDir=C:/MySQL714b/node2/data
[mysqld]
[mysqld]
[mysqld]
</config>
Start the mgm and cluster nodes from 3 different DOS windows, so you can see the output from each and every one of them.
Go into ndb_mgm and do a show:
<command>
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 @127.0.0.1 (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0, Master)
id=3 @127.0.0.1 (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @127.0.0.1 (mysql-5.1.44 ndb-7.1.4)
[mysqld(API)] 3 node(s)
id=4 (not connected, accepting connect from any host)
id=5 (not connected, accepting connect from any host)
id=6 (not connected, accepting connect from any host)
</command>
Now, restart node 3 and show status:
<command>
ndb_mgm> 3 restart -n
Node 3: Node shutdown initiated
Node 3: Node shutdown completed, restarting, no start.
Node 3 is being restarted
ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 @127.0.0.1 (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0, Master)
id=3 @127.0.0.1 (mysql-5.1.44 ndb-7.1.4, not started)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @127.0.0.1 (mysql-5.1.44 ndb-7.1.4)
[mysqld(API)] 3 node(s)
id=4 (not connected, accepting connect from any host)
id=5 (not connected, accepting connect from any host)
id=6 (not connected, accepting connect from any host)
</command>
Now, restart node 3:
<command>
ndb_mgm> 3 start
</command>
Now, many things can happen here. Sometimes node 3 starts and all is fine. Sometimes, node 3 just dies:
<command>
ndb_mgm> 3 start
Start failed.
* 22: Error
* No contact with the process (dead ?).: Permanent error: Application error
</command>
Often, the restart works as expected:
<command>
ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 @127.0.0.1 (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0, Master)
id=3 @127.0.0.1 (mysql-5.1.44 ndb-7.1.4, not started)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @127.0.0.1 (mysql-5.1.44 ndb-7.1.4)
[mysqld(API)] 3 node(s)
id=4 (not connected, accepting connect from any host)
id=5 (not connected, accepting connect from any host)
id=6 (not connected, accepting connect from any host)
</command>
But the ndb_mgm throws out errors like crazy:
<output>
2010-08-06 16:32:07 [MgmtSrvr] WARNING -- Failed to convert connection from '127.0.0.1:4342' to transporter
Failed to report event to event log, error: 1502
2010-08-06 16:32:07 [MgmtSrvr] WARNING -- Failed to convert connection from '127.0.0.1:4343' to transporter
Failed to report event to event log, error: 1502
2010-08-06 16:32:07 [MgmtSrvr] WARNING -- Failed to convert connection from '127.0.0.1:4344' to transporter
Failed to report event to event log, error: 1502
2010-08-06 16:32:07 [MgmtSrvr] WARNING -- Failed to convert connection from '127.0.0.1:4345' to transporter
Failed to report event to event log, error: 1502
2010-08-06 16:32:08 [MgmtSrvr] WARNING -- Failed to convert connection from '127.0.0.1:4346' to transporter
Failed to report event to event log, error: 1502
</output>
At this point, starting node 3 sometimes works, sometimes not. But when it DOWS works, ndb_mgmd just goes on throwing even more errors.