Bug #24606 | ndb_mgm does not return to command prompt immediately after a node shutdown | ||
---|---|---|---|
Submitted: | 27 Nov 2006 4:42 | Modified: | 29 Nov 2006 10:35 |
Reporter: | Anatoly Pidruchny (Candidate Quality Contributor) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S4 (Feature request) |
Version: | mysql-5.1 | OS: | Linux (Linux) |
Assigned to: | CPU Architecture: | Any | |
Tags: | 5.1.12 |
[27 Nov 2006 4:42]
Anatoly Pidruchny
[27 Nov 2006 4:43]
Anatoly Pidruchny
Changed category to Cluster.
[27 Nov 2006 10:53]
Hartmut Holzgraefe
Can't reproduce that one. Can you attach the cluster config file and logs to this bug report, e.g. by using ndb_error_reporter to extract them?
[28 Nov 2006 14:32]
Anatoly Pidruchny
Hi, Hartmut, well, I just changed the synopsis of this bug a little bit. My mistake was that I did not wait long enough for the command prompt to appear. Today I tried the "3 stop" command and waited longer and ndb_mgm finally returned to the command prompt: =============== ndb_mgm output =============== ndb_mgm> 3 stop Node 3: Node shutdown initiated Node 3: Node shutdown completed. Node 3 has shutdown. ndb_mgm> =============== ndb_mgm output =============== Here is what was written into the cluster log: =============== cluster log =============== 2006-11-28 09:02:39 [MgmSrvr] INFO -- Node 3: Node shutdown initiated 2006-11-28 09:02:49 [MgmSrvr] ALERT -- Node 2: Node 3 Disconnected 2006-11-28 09:02:49 [MgmSrvr] INFO -- Node 2: Communication to Node 3 closed 2006-11-28 09:02:49 [MgmSrvr] ALERT -- Node 2: Network partitioning - arbitration required 2006-11-28 09:02:49 [MgmSrvr] INFO -- Node 2: President restarts arbitration thread [state=7] 2006-11-28 09:02:49 [MgmSrvr] INFO -- Node 1: Node 3 Connected 2006-11-28 09:02:49 [MgmSrvr] ALERT -- Node 2: Arbitration won - positive reply from node 1 2006-11-28 09:02:49 [MgmSrvr] INFO -- Node 2: DICT: lock bs: 0 ops: 0 poll: 0 cnt: 0 queue: 2006-11-28 09:02:49 [MgmSrvr] INFO -- Node 3: Node shutdown completed. 2006-11-28 09:02:50 [MgmSrvr] INFO -- Node 2: Started arbitrator node 1 [ticket=1f9700042ee002b7] 2006-11-28 09:03:21 [MgmSrvr] INFO -- Node 2: Communication to Node 3 opened =============== cluster log =============== As you can see, note shutdown was initiated at 09:02:39. First thing, I do not understand why at 09:02:49 there is message "Node 1: Node 3 Connected". Is not it supposed to be "Node 1: Node 3 Disconnected"? Why Node 3 connected again to Node 1? Is not it supposed to be already connected before the shutdown? Second thing, at 09:02:49 there was the message "Node 3: Node shutdown completed.", at that time ndb_mgm also printed the message "Node 3: Node shutdown completed." Well, shutdown is completed, but ndb_mgm does not return to the command prompt at that time. Only at 09:03:21, that is after 33 seconds, a strange message "Node 2: Communication to Node 3 opened" appeared in the cluster log and ndb_mgm returned to the command prompt. What does it mean "Communication to Node 3 opened"? Is not node 3 shut down at that time? How could communication be opened to node 3 when it is down? What is ndb_mgm waiting for these 33 seconds and not returning to the command prompt? /Anatoly
[28 Nov 2006 14:35]
Anatoly Pidruchny
Cluster configuration file
Attachment: config.ini (application/octet-stream, text), 861 bytes.
[28 Nov 2006 14:39]
Anatoly Pidruchny
Here are the lines from ndb_3_out.log: 2006-11-28 09:02:48 [ndbd] INFO -- Shutdown initiated 2006-11-28 09:02:49 [ndbd] INFO -- Shutdown completed - exiting 2006-11-28 09:02:49 [ndbd] INFO -- Angel shutting down 2006-11-28 09:02:49 [ndbd] INFO -- Node 3: Node shutdown completed.
[28 Nov 2006 15:00]
Jonas Oreland
>As you can see, note shutdown was initiated at 09:02:39. > First thing, I do not understand why at 09:02:49 there is message > "Node 1:Node 3 Connected". > Is not it supposed to be "Node 1: Node 3 Disconnected"? > Why Node 3 connected again to Node 1? Is not it supposed to be already > connected before the shutdown? There is bug report for this very confusing message, (hmm...cant find it now) > Second thing, at 09:02:49 there was the message "Node 3: Node shutdown > completed.", at that time ndb_mgm also printed the message "Node 3: Node > shutdown completed." Well, shutdown is completed, but ndb_mgm does not > return to the command prompt at that time. Only at 09:03:21, > that is after 33 seconds, a strange message > "Node 2: Communication to Node 3 opened" appeared in the cluster > log and ndb_mgm returned to the command prompt. > What does it mean "Communication to Node 3 opened"? > Is not node 3 shut down at that time? How could communication be opened > to node 3 when it is down? What is ndb_mgm waiting for these 33 > seconds and not returning to the command prompt? The prompt is returned when the rest of the db-nodes has concluded failure handling of the shutdown node. When this happens, they also "open" the port for the node to reconnect, this = "Communication to node X opened"... A better meesage would be "Node X is now allowed to connect" --- How long it takes for nodes to conclude failure-handling depends on many things /Jonas
[28 Nov 2006 15:35]
Anatoly Pidruchny
Jonas, thanks for the explanation. I am glad that you agree that those two messages are very confusing. I hope they will be fixed. Well, I agree that it is OK that ndb_mgm waits and returns command prompt only when the rest of the db-nodes has concluded failure handling of the shutdown node. But it is confusing that ndb_mgm prints the message "Node 3 has shutdown." and still waits for something. From a user point of view the operation is completed. Could it at least print a message like "Waiting for the rest of the db-nodes to conclude failure handling of the shutdown node."? The status of this bug is "Need Feedback". Do you need any more information from me? /Anatoly
[6 Dec 2006 12:14]
Jonas Oreland
FYI: confusing message *was* http://bugs.mysql.com/bug.php?id=22773 more info, no dont think so...