Bug #65020 Restarting failed ndb node seems to succeed in mcm but doesn't in real world
Submitted: 18 Apr 2012 9:13 Modified: 29 May 2012 18:33
Reporter: Mario Beck Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster Manager: CLI Severity:S3 (Non-critical)
Version:1.1.5 OS:Any
Assigned to: CPU Architecture:Any
Tags: failed node, mcm, restart

[18 Apr 2012 9:13] Mario Beck
Description:
I killed -9 an ndb process. Status is failed. Restart will not work.
But: "start process 1 mycluster" returns "process successfully started".
And the operation is listed as "finished".
See transcript below.
A proper error and a hint to run "stop process" first to clear the failed status would be nicer to support the poor admin.
The current behavior is misleading.

How to repeat:
Setup a cluster with "mcmd --bootstrap".
Once the cluster is up and running kill one of the ndbd processes:
kill -9 <pid>
Run the following mcm session:

mcm> show status --process mycluster;
+--------+----------+--------+---------+-----------+-----------+
| NodeId | Process  | Host   | Status  | Nodegroup | Package   |
+--------+----------+--------+---------+-----------+-----------+
| 49     | ndb_mgmd | olga64 | running |           | mypackage |
| 1      | ndbd     | olga64 | failed  | 0         | mypackage |
| 2      | ndbd     | olga64 | running | 0         | mypackage |
| 50     | mysqld   | olga64 | running |           | mypackage |
| 51     | mysqld   | olga64 | running |           | mypackage |
| 52     | ndbapi   | *      | added   |           |           |
+--------+----------+--------+---------+-----------+-----------+
6 rows in set (0.03 sec)

mcm> start process 1 mycluster;
+------------------------------+
| Command result               |
+------------------------------+
| Process started successfully |
+------------------------------+
1 row in set (0.23 sec)

mcm> show status --operation mycluster;
+---------------+----------+--------------+
| Command       | Status   | Description  |
+---------------+----------+--------------+
| start process | finished | <no message> |
+---------------+----------+--------------+
1 row in set (0.03 sec)

mcm> show status --process mycluster;
+--------+----------+--------+---------+-----------+-----------+
| NodeId | Process  | Host   | Status  | Nodegroup | Package   |
+--------+----------+--------+---------+-----------+-----------+
| 49     | ndb_mgmd | olga64 | running |           | mypackage |
| 1      | ndbd     | olga64 | failed  | 0         | mypackage |
| 2      | ndbd     | olga64 | running | 0         | mypackage |
| 50     | mysqld   | olga64 | running |           | mypackage |
| 51     | mysqld   | olga64 | running |           | mypackage |
| 52     | ndbapi   | *      | added   |           |           |
+--------+----------+--------+---------+-----------+-----------+
6 rows in set (0.11 sec)

Suggested fix:
Do not report "Process started successfully" but instead display a message
that the node is in "failed" state and you have to run "stop process <nodeid> <clustername> first in order to clear the "failed" status. And then try "start process <nodeid> <clustername>" again.

Even more convenient would be to clear the failed status automatically when restarting.
START PROCESS should check if status is "failed". If yes, issue STOP first, then go to the normal START procedure.
[29 May 2012 11:44] Kari Juul Wedde
This bug is fixed in mcm1.1.6
[29 May 2012 18:33] Magnus BlÄudd
Thank you for your bug report. This issue has already been fixed in the latest released version of that product, which you can download at

  http://www.mysql.com/downloads/