Bug #24606 ndb_mgm does not return to command prompt immediately after a node shutdown
Submitted: 27 Nov 2006 4:42 Modified: 29 Nov 2006 10:35
Reporter: Anatoly Pidruchny (Candidate Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S4 (Feature request)
Version:mysql-5.1 OS:Linux (Linux)
Assigned to: CPU Architecture:Any
Tags: 5.1.12
Triage: Triaged: D5 (Feature request)

[27 Nov 2006 4:42] Anatoly Pidruchny
Description:
When you run ndb_mgm and execute the command "n stop", the operation is processed fine, but ndb_mgm does not return to the command prompt. Here is how it looks:

ndb_mgm> 2 stop
Node 2: Node shutdown initiated
Node 2: Node shutdown completed.

After that the "ndb_mgm>" command prompt does not appear and you have to exit using Ctrl+C and run ngb_mgm again to be able to execute some other commands.

May be there are other commands, besides "n stop" that cause the same problem.

How to repeat:
In a running Cluster with mgm node 1, data nodes 2 and 3 and API nodes 4 and 5, run ndb_mgm and execute the command "2 stop".
[27 Nov 2006 4:43] Anatoly Pidruchny
Changed category to Cluster.
[27 Nov 2006 10:53] Hartmut Holzgraefe
Can't reproduce that one.

Can you attach the cluster config file and logs to this bug report,
e.g. by using ndb_error_reporter to extract them?
[28 Nov 2006 14:32] Anatoly Pidruchny
Hi, Hartmut,

well, I just changed the synopsis of this bug a little bit. My mistake was that I did not wait long enough for the command prompt to appear. Today I tried the "3 stop" command and waited longer and ndb_mgm finally returned to the command prompt:

=============== ndb_mgm output ===============
ndb_mgm> 3 stop
Node 3: Node shutdown initiated
Node 3: Node shutdown completed.
Node 3 has shutdown.

ndb_mgm> 
=============== ndb_mgm output ===============

Here is what was written into the cluster log:

=============== cluster log ===============
2006-11-28 09:02:39 [MgmSrvr] INFO     -- Node 3: Node shutdown initiated
2006-11-28 09:02:49 [MgmSrvr] ALERT    -- Node 2: Node 3 Disconnected
2006-11-28 09:02:49 [MgmSrvr] INFO     -- Node 2: Communication to Node 3 closed
2006-11-28 09:02:49 [MgmSrvr] ALERT    -- Node 2: Network partitioning - arbitration required
2006-11-28 09:02:49 [MgmSrvr] INFO     -- Node 2: President restarts arbitration thread [state=7]
2006-11-28 09:02:49 [MgmSrvr] INFO     -- Node 1: Node 3 Connected
2006-11-28 09:02:49 [MgmSrvr] ALERT    -- Node 2: Arbitration won - positive reply from node 1
2006-11-28 09:02:49 [MgmSrvr] INFO     -- Node 2: DICT: lock bs: 0 ops: 0 poll: 0 cnt: 0 queue:
2006-11-28 09:02:49 [MgmSrvr] INFO     -- Node 3: Node shutdown completed.
2006-11-28 09:02:50 [MgmSrvr] INFO     -- Node 2: Started arbitrator node 1 [ticket=1f9700042ee002b7]
2006-11-28 09:03:21 [MgmSrvr] INFO     -- Node 2: Communication to Node 3 opened
=============== cluster log ===============

As you can see, note shutdown was initiated at 09:02:39.

First thing, I do not understand why at 09:02:49 there is message "Node 1: Node 3 Connected". Is not it supposed to be "Node 1: Node 3 Disconnected"? Why Node 3 connected again to Node 1? Is not it supposed to be already connected before the shutdown?

Second thing, at 09:02:49 there was the message "Node 3: Node shutdown completed.", at that time ndb_mgm also printed the message "Node 3: Node shutdown completed." Well, shutdown is completed, but ndb_mgm does not return to the command prompt at that time. Only at 09:03:21, that is after 33 seconds, a strange message "Node 2: Communication to Node 3 opened" appeared in the cluster log and ndb_mgm returned to the command prompt. What does it mean "Communication to Node 3 opened"? Is not node 3 shut down at that time? How could communication be opened to node 3 when it is down? What is ndb_mgm waiting for these 33 seconds and not returning to the command prompt?

/Anatoly
[28 Nov 2006 14:35] Anatoly Pidruchny
Cluster configuration file

Attachment: config.ini (application/octet-stream, text), 861 bytes.

[28 Nov 2006 14:39] Anatoly Pidruchny
Here are the lines from ndb_3_out.log:

2006-11-28 09:02:48 [ndbd] INFO     -- Shutdown initiated
2006-11-28 09:02:49 [ndbd] INFO     -- Shutdown completed - exiting
2006-11-28 09:02:49 [ndbd] INFO     -- Angel shutting down
2006-11-28 09:02:49 [ndbd] INFO     -- Node 3: Node shutdown completed.
[28 Nov 2006 15:00] Jonas Oreland
>As you can see, note shutdown was initiated at 09:02:39.

> First thing, I do not understand why at 09:02:49 there is message 
> "Node 1:Node 3 Connected". 
> Is not it supposed to be "Node 1: Node 3 Disconnected"? 
> Why Node 3 connected again to Node 1? Is not it supposed to be already
> connected before the shutdown?

There is bug report for this very confusing message,
(hmm...cant find it now)

> Second thing, at 09:02:49 there was the message "Node 3: Node shutdown
> completed.", at that time ndb_mgm also printed the message "Node 3: Node
> shutdown completed." Well, shutdown is completed, but ndb_mgm does not 
> return to the command prompt at that time. Only at 09:03:21, 
> that is after 33 seconds, a strange message
> "Node 2: Communication to Node 3 opened" appeared in the cluster
> log and ndb_mgm returned to the command prompt. 
> What does it mean "Communication to Node 3 opened"? 
> Is not node 3 shut down at that time? How could communication be opened
> to node 3 when it is down? What is ndb_mgm waiting for these 33
> seconds and not returning to the command prompt?

The prompt is returned when the rest of the db-nodes has concluded failure 
handling of the shutdown node.

When this happens, they also "open" the port for the node to reconnect,
  this = "Communication to node X opened"...

A better meesage would be "Node X is now allowed to connect"

---

How long it takes for nodes to conclude failure-handling depends on many things

/Jonas
[28 Nov 2006 15:35] Anatoly Pidruchny
Jonas,

thanks for the explanation. I am glad that you agree that those two messages are very confusing. I hope they will be fixed.

Well, I agree that it is OK that ndb_mgm waits and returns command prompt only when the rest of the db-nodes has concluded failure handling of the shutdown node. But it is confusing that ndb_mgm prints the message "Node 3 has shutdown." and still waits for something. From a user point of view the operation is completed. Could it at least print a message like "Waiting for the rest of the db-nodes to conclude failure handling of the shutdown node."?

The status of this bug is "Need Feedback". Do you need any more information from me?

/Anatoly
[6 Dec 2006 12:14] Jonas Oreland
FYI: confusing message *was* http://bugs.mysql.com/bug.php?id=22773

more info, no dont think so...