Bug #23354 ndb_mgm shows "no start" on node restart even if not a no start start
Submitted: 17 Oct 2006 2:14 Modified: 15 Sep 2007 13:06
Reporter: Stewart Smith Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.0,5.1 OS:Any
Assigned to: li zhou CPU Architecture:Any

[17 Oct 2006 2:14] Stewart Smith
Description:
<jeb> ndb_mgm> 2 restart
<jeb> Connected to Management Server at: ndb08:14000
<jeb> Node 2: Node shutdown initiated
<jeb> Node 2: Node shutdown completed, restarting, no start.
<jeb> Node 2 is being restarted
<jeb> stewart: why does it say "no start"?
<stewart> hrrm.. that "no start" doesn't look right.
<jeb> that has always bothered me.
<stewart> hrrm... has it always been there?
<jeb> yes
<stewart> i am totally blind, never noticed
<jeb> even if I do -i
<stewart> i think this deserves a bug report.

How to repeat:
see above

Suggested fix:
don't say "no start" when we're not going for a "no start" start.
[15 Nov 2006 1:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/15323

ChangeSet@1.2259, 2006-11-15 09:33:01+00:00, lzhou@dev3-138.dev.cn.tlan +1 -0
  BUG#23354 Change the message when do restart operation. It Dont's show 'no start' on node restart if not a nostart state
[29 Jan 2007 6:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/18930

ChangeSet@1.2320, 2007-01-29 14:49:16+00:00, lzhou@dev3-63.(none) +1 -0
  Bug#23354 Add explaination when ndb_mgm do restart
[29 Jan 2007 7:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/18931

ChangeSet@1.2320, 2007-01-29 15:17:40+00:00, lzhou@dev3-63.(none) +1 -0
  Bug#23354 Add explaination when ndb_mgm do restart
[29 Jan 2007 7:24] li zhou
The new output is :

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=2    @172.16.70.63  (Version: 5.0.32, Nodegroup: 0, Master)
id=3    @172.16.70.63  (Version: 5.0.32, Nodegroup: 0)
id=4    @172.16.70.63  (Version: 5.0.32, Nodegroup: 1)
id=5    @172.16.70.63  (Version: 5.0.32, Nodegroup: 1)

ndb_mgm> 3 restart
Shutting down nodes with "-n, no start" option, to subsequently start the nodes.
Node 3: Node shutdown initiated
Node 3: Node shutdown completed, restarting, no start.
Node 3 is being restarted
ndb_mgm> Node 3: Started (version 5.0.32)

ndb_mgm> 3 restart -n
Node 3: Node shutdown initiated
Node 3: Node shutdown completed, restarting, no start.
Node 3 is being restarted
[29 Mar 2007 8:11] Stewart Smith
looks good to me.
[7 Apr 2007 7:00] Bugs System
Pushed into 5.0.40
[7 Apr 2007 7:00] Bugs System
Pushed into 5.1.18-beta
[7 Apr 2007 7:58] Tomas Ulin
5.0.40, 5.1.18, telco 6.2.1
[10 Apr 2007 4:10] Jon Stephens
Has it ever occurred to anybody just how completely bizarre and confusing it is to speak of a "no start start"?

The --nostart option has *nothing* to do with whether or not the ndbd process is actually *starting*, does it? It has to do with whether or not the node is joining the cluster, right? Then why it isn't it that we say so?

1. The ndbd -n option's long form should be --nojoin or --noattach.

2. The MGM client command should be <node_id> JOIN or <node_id> ATTACH (rather than <node_id> START): this command doesn't (and can't) actually start an ndbd process, does it?

3. Does <node_id> RESTART actually stop and restart the ndbd process, or does it just cause the node to detach from the cluster and then reattach (or wait to be reattached manually)? Assuming the former, then, after issuing a RESTART command, the output in the MGM client should look something like this:

ndb_mgm> 3 RESTART
Shutting down Node 3.
Node 3: Node shutdown initiated
Node 3: Node shutdown completed, restarting...
Node 3 is being restarted.
Node 3 has restarted.
Node 3 has rejoined the cluster (version x.y.zz)
ndb_mgm>

ndb_mgm> 3 RESTART -n
Node 3: Node shutdown initiated
Node 3: Node shutdown completed, restarting (no auto-rejoin)...
Node 3 is being restarted.
Node 3 has restarted.
ndb_mgm>

We really should stop playing Humpty-Dumpty and use words that mean what the rest of the world think they mean.
[11 Apr 2007 19:43] Jonathan Miller
Hi,

1) I agree with Jon 

2) This looks backwards to me

ndb_mgm> 3 restart
Shutting down nodes with "-n, no start" option, to subsequently start the
nodes.
Node 3: Node shutdown initiated
Node 3: Node shutdown completed, restarting, no start.
Node 3 is being restarted
ndb_mgm> Node 3: Started (version 5.0.32)

ndb_mgm> 3 restart -n
Node 3: Node shutdown initiated
Node 3: Node shutdown completed, restarting, no start.
Node 3 is being restarted

Seems like it should be:

ndb_mgm> 3 restart
Node 3: Node shutdown initiated
Node 3: Node shutdown completed, restarting, no start.
Node 3 is being restarted
ndb_mgm> Node 3: Started (version 5.0.32)

ndb_mgm> 3 restart -n
Shutting down nodes with "-n, no start" option, to subsequently start the
nodes.
Node 3: Node shutdown initiated
Node 3: Node shutdown completed, restarting, no start.
Node 3 is being restarted
ndb_mgm> Node 3: Started (version 5.0.32)

3) I am using the latest mysql-5.1-telco clone and I am gettng the following

ndb_mgm> 3 restart -i
Shutting down nodes with "-n, no start" option, to subsequently start the nodes.
Node 3: Node shutdown completed, restarting, no start, initial.
Node 3 is being restarted

ndb_mgm> 3 restart
Shutting down nodes with "-n, no start" option, to subsequently start the nodes.
Node 3: Node shutdown completed, restarting, no start.
Node 3 is being restarted

In addition, with the latest clone I am not getting:
ndb_mgm> Node 3: started (mysql-5.1.18 ndb-6.2.1)
[12 Apr 2007 1:31] Jon Stephens
Tomas Ulin wrote:
> Jon,
> 
> I would not mind doing a total rework of the naming of the switches... 
> names are historical and bad... but I'm worried about backwards
> compatability...

Hi Tomas!

Thanks for your response.

Sorry about my little rant - every once in a while, I see something that makes my eyes cross and I get a bit excited, I guess. ;)

The switches really ought to be fixed, but 5.1 beta probably isn't the place to do it, but rather 5.2 or 6.0.

Maybe this is something we could plan out and write a WL for during my "Internship" period in Stockholm?

In the meantime, now that I have your attention, I'll document the behaviour change and close the bug.
[12 Apr 2007 2:03] Jon Stephens
Um, I'm just looking at this again...

ndb_mgm> 5 restart
Shutting down nodes with "-n, no start" option, to subsequently start the nodes.
Node 5: Node shutdown initiated
Node 5: Node shutdown completed, restarting, no start.
Node 5 is being restarted
ndb_mgm> Node 5: Started (version 5.1.18)

One additional problem I see here is that starting a data node with the -n switch means that you must issue a START command before the node joins the cluster, right?

ndb_mgm> 5 stop
Node 5: Node shutdown initiated
Node 5: Node shutdown completed.
Node 5 has shutdown.

ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=5 (not connected, accepting connect from 192.168.0.103)
id=6    @192.168.0.102  (Version: 5.1.18, Nodegroup: 0)
id=7    @192.168.0.103  (Version: 5.1.18, Nodegroup: 0, Master)
id=8    @192.168.0.102  (Version: 5.1.18, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)
id=1    @192.168.0.103  (Version: 5.1.18)
id=2    @192.168.0.102  (Version: 5.1.18)

[mysqld(API)]   6 node(s)
id=10   @192.168.0.103  (Version: 5.1.18)
id=11   @192.168.0.102  (Version: 5.1.18)
id=12 (not connected, accepting connect from any host)
id=13 (not connected, accepting connect from any host)
id=14 (not connected, accepting connect from any host)
id=15 (not connected, accepting connect from any host)

ndb_mgm> exit
ndb_mgm> exit

jon@gigan:/usr/local/mysql/bin> su
Password:
gigan:/usr/local/mysql/bin # ../libexec/ndbd -c 192.168.0.102,192.168.0.103 -n
gigan:/usr/local/mysql/bin # exit

jon@gigan:/usr/local/mysql/bin> ./ndb_mgm

-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=5    @192.168.0.103  (Version: 5.1.18, not started)
id=6    @192.168.0.102  (Version: 5.1.18, Nodegroup: 0)
id=7    @192.168.0.103  (Version: 5.1.18, Nodegroup: 0)
id=8    @192.168.0.102  (Version: 5.1.18, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)
id=1    @192.168.0.103  (Version: 5.1.18)
id=2    @192.168.0.102  (Version: 5.1.18)

[mysqld(API)]   6 node(s)
id=10   @192.168.0.103  (Version: 5.1.18)
id=11   @192.168.0.102  (Version: 5.1.18)
id=12 (not connected, accepting connect from any host)
id=13 (not connected, accepting connect from any host)
id=14 (not connected, accepting connect from any host)
id=15 (not connected, accepting connect from any host)

ndb_mgm> 5 start
Database node 5 is being started.

ndb_mgm> Node 5: Start initiated (version 5.1.18)
Node 5: Started (version 5.1.18)

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=5    @192.168.0.103  (Version: 5.1.18, Nodegroup: 0)
id=6    @192.168.0.102  (Version: 5.1.18, Nodegroup: 0)
id=7    @192.168.0.103  (Version: 5.1.18, Nodegroup: 0, Master)
id=8    @192.168.0.102  (Version: 5.1.18, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)
id=1    @192.168.0.103  (Version: 5.1.18)
id=2    @192.168.0.102  (Version: 5.1.18)

[mysqld(API)]   6 node(s)
id=10   @192.168.0.103  (Version: 5.1.18)
id=11   @192.168.0.102  (Version: 5.1.18)
id=12 (not connected, accepting connect from any host)
id=13 (not connected, accepting connect from any host)
id=14 (not connected, accepting connect from any host)
id=15 (not connected, accepting connect from any host)

But this is not the case when using RESTART, as shown above.

It's true only for RESTART -n:

ndb_mgm> 5 restart -n
Node 5: Node shutdown initiated
Node 5: Node shutdown completed, restarting, no start.
Node 5 is being restarted

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=5    @192.168.0.103  (Version: 5.1.18, not started)
id=6    @192.168.0.102  (Version: 5.1.18, Nodegroup: 0)
id=7    @192.168.0.103  (Version: 5.1.18, Nodegroup: 0)
id=8    @192.168.0.102  (Version: 5.1.18, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)
id=1    @192.168.0.103  (Version: 5.1.18)
id=2    @192.168.0.102  (Version: 5.1.18)

[mysqld(API)]   6 node(s)
id=10   @192.168.0.103  (Version: 5.1.18)
id=11   @192.168.0.102  (Version: 5.1.18)
id=12 (not connected, accepting connect from any host)
id=13 (not connected, accepting connect from any host)
id=14 (not connected, accepting connect from any host)
id=15 (not connected, accepting connect from any host)

ndb_mgm> 5 start
Database node 5 is being started.

ndb_mgm> Node 5: Start initiated (version 5.1.18)
Node 5: Started (version 5.1.18)

ndb_mgm>               

Let's accept for now that

"[re]start -n | ndbd {-n|--nostart}" = "start the node but don't let it join the cluster until I tell it to join using <node_id> START"

Even so, the 'Shutting down nodes with "-n, no start" option, to subsequently start the nodes' message is still wrong, *unless the node was actually started with -n|--nostart*. 

Summary:

1. If the -n|--nostart switch is used, then the client should report that the nostart option is being used -> the user must command the node to join the cluster using START.

2. If the -n|--nostart switch is not used, then the client should not say that it is being used. This, I believe, was the original problem observed by Jeb.
[8 Jun 2007 4:08] Jonathan Miller
Hi,

why can't:

ndb_mgm> 2 restart
Node 2: Node shutdown initiated
Node 2: Node shutdown completed, restarting, no start.
Node 2 is being restarted. Rejoining the cluster

Be....

ndb_mgm> 2 restart
Node 2: Node shutdown initiated
Node 2: Node shutdown completed, restarting data node.
Node 2: Is rejoining the cluster
Node 2: Restart has been completed

What the heck does:

ndb_mgm> 2 restart
Node 2: Node shutdown initiated
Node 2: Node shutdown completed, restarting, (no start) <--- This mean?????
Node 2 is being restarted. Rejoining the cluster

Best wishes,
/Jeb
[8 Jun 2007 11:50] Jonathan Miller
You do not have to change the kernel, it is all semantics, I really don't care that you are doing a "no start" and a start under the covers. If I do:

ndb_mgm> 2 restart

I really should not see the words "no start".

IT is confusing and IT is not needed for the customer to see.

>If we want to remove "no start", we must change work flow.... 
or kernel interface?

Why can we just change the print statement for restart with no -n?
[13 Jun 2007 2:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28622

ChangeSet@1.2476, 2007-06-13 10:42:21+00:00, lzhou@dev3-63.(none) +2 -0
  BUG#23354 Clear "no start" when nodes restart and add new prompting string when nodes restart
[13 Jun 2007 2:55] li zhou
New output:

ndb_mgm> 2 restart
Node 2: Node shutdown initiated
Node 2: Node shutdown completed, restarting.
Node 2: Is being restarted
Node 2: Is rejoining the cluster

ndb_mgm> Node 2: Started (version 5.0.42)

ndb_mgm> 2 restart -n
Shutting down nodes with "-n, no start" option, to subsequently start the nodes.
Node 2: Node shutdown initiated
Node 2: Node shutdown completed, restarting.
Node 2: Is being restarted
Node 2: No start
[30 Jul 2007 3:09] li zhou
pushed into 5.1.19 ndb-bj tree and 5.0.44 ndb-bj tree
[12 Sep 2007 11:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/34085

ChangeSet@1.2479, 2007-09-12 13:53:32+02:00, tomas@whalegate.ndb.mysql.com +2 -0
  BUG#23354 revert
[14 Sep 2007 16:25] Bugs System
Pushed into 5.0.50
[14 Sep 2007 16:25] Bugs System
Pushed into 5.1.23-beta
[15 Sep 2007 13:06] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented bugfix in 5.0.50 and 5.1.23 changelogs.