MySQL Bugs: #65020: Restarting failed ndb node seems to succeed in mcm but doesn't in real world

Bug #65020	Restarting failed ndb node seems to succeed in mcm but doesn't in real world
Submitted:	18 Apr 2012 9:13	Modified:	29 May 2012 18:33
Reporter:	Mario Beck	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster Manager: CLI	Severity:	S3 (Non-critical)
Version:	1.1.5	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	failed node, mcm, restart

Description:
I killed -9 an ndb process. Status is failed. Restart will not work.
But: "start process 1 mycluster" returns "process successfully started".
And the operation is listed as "finished".
See transcript below.
A proper error and a hint to run "stop process" first to clear the failed status would be nicer to support the poor admin.
The current behavior is misleading.

How to repeat:
Setup a cluster with "mcmd --bootstrap".
Once the cluster is up and running kill one of the ndbd processes:
kill -9 <pid>
Run the following mcm session:

mcm> show status --process mycluster;
+--------+----------+--------+---------+-----------+-----------+
| NodeId | Process  | Host   | Status  | Nodegroup | Package   |
+--------+----------+--------+---------+-----------+-----------+
| 49     | ndb_mgmd | olga64 | running |           | mypackage |
| 1      | ndbd     | olga64 | failed  | 0         | mypackage |
| 2      | ndbd     | olga64 | running | 0         | mypackage |
| 50     | mysqld   | olga64 | running |           | mypackage |
| 51     | mysqld   | olga64 | running |           | mypackage |
| 52     | ndbapi   | *      | added   |           |           |
+--------+----------+--------+---------+-----------+-----------+
6 rows in set (0.03 sec)

mcm> start process 1 mycluster;
+------------------------------+
| Command result               |
+------------------------------+
| Process started successfully |
+------------------------------+
1 row in set (0.23 sec)

mcm> show status --operation mycluster;
+---------------+----------+--------------+
| Command       | Status   | Description  |
+---------------+----------+--------------+
| start process | finished | <no message> |
+---------------+----------+--------------+
1 row in set (0.03 sec)

mcm> show status --process mycluster;
+--------+----------+--------+---------+-----------+-----------+
| NodeId | Process  | Host   | Status  | Nodegroup | Package   |
+--------+----------+--------+---------+-----------+-----------+
| 49     | ndb_mgmd | olga64 | running |           | mypackage |
| 1      | ndbd     | olga64 | failed  | 0         | mypackage |
| 2      | ndbd     | olga64 | running | 0         | mypackage |
| 50     | mysqld   | olga64 | running |           | mypackage |
| 51     | mysqld   | olga64 | running |           | mypackage |
| 52     | ndbapi   | *      | added   |           |           |
+--------+----------+--------+---------+-----------+-----------+
6 rows in set (0.11 sec)

Suggested fix:
Do not report "Process started successfully" but instead display a message
that the node is in "failed" state and you have to run "stop process <nodeid> <clustername> first in order to clear the "failed" status. And then try "start process <nodeid> <clustername>" again.

Even more convenient would be to clear the failed status automatically when restarting.
START PROCESS should check if status is "failed". If yes, issue STOP first, then go to the normal START procedure.

This bug is fixed in mcm1.1.6

Thank you for your bug report. This issue has already been fixed in the latest released version of that product, which you can download at

  http://www.mysql.com/downloads/