Description:
I killed -9 an ndb process. Status is failed. Restart will not work.
But: "start process 1 mycluster" returns "process successfully started".
And the operation is listed as "finished".
See transcript below.
A proper error and a hint to run "stop process" first to clear the failed status would be nicer to support the poor admin.
The current behavior is misleading.
How to repeat:
Setup a cluster with "mcmd --bootstrap".
Once the cluster is up and running kill one of the ndbd processes:
kill -9 <pid>
Run the following mcm session:
mcm> show status --process mycluster;
+--------+----------+--------+---------+-----------+-----------+
| NodeId | Process | Host | Status | Nodegroup | Package |
+--------+----------+--------+---------+-----------+-----------+
| 49 | ndb_mgmd | olga64 | running | | mypackage |
| 1 | ndbd | olga64 | failed | 0 | mypackage |
| 2 | ndbd | olga64 | running | 0 | mypackage |
| 50 | mysqld | olga64 | running | | mypackage |
| 51 | mysqld | olga64 | running | | mypackage |
| 52 | ndbapi | * | added | | |
+--------+----------+--------+---------+-----------+-----------+
6 rows in set (0.03 sec)
mcm> start process 1 mycluster;
+------------------------------+
| Command result |
+------------------------------+
| Process started successfully |
+------------------------------+
1 row in set (0.23 sec)
mcm> show status --operation mycluster;
+---------------+----------+--------------+
| Command | Status | Description |
+---------------+----------+--------------+
| start process | finished | <no message> |
+---------------+----------+--------------+
1 row in set (0.03 sec)
mcm> show status --process mycluster;
+--------+----------+--------+---------+-----------+-----------+
| NodeId | Process | Host | Status | Nodegroup | Package |
+--------+----------+--------+---------+-----------+-----------+
| 49 | ndb_mgmd | olga64 | running | | mypackage |
| 1 | ndbd | olga64 | failed | 0 | mypackage |
| 2 | ndbd | olga64 | running | 0 | mypackage |
| 50 | mysqld | olga64 | running | | mypackage |
| 51 | mysqld | olga64 | running | | mypackage |
| 52 | ndbapi | * | added | | |
+--------+----------+--------+---------+-----------+-----------+
6 rows in set (0.11 sec)
Suggested fix:
Do not report "Process started successfully" but instead display a message
that the node is in "failed" state and you have to run "stop process <nodeid> <clustername> first in order to clear the "failed" status. And then try "start process <nodeid> <clustername>" again.
Even more convenient would be to clear the failed status automatically when restarting.
START PROCESS should check if status is "failed". If yes, issue STOP first, then go to the normal START procedure.