Bug #34201 Unable stop a node when a node in a different group is in "not started" state
Submitted: 31 Jan 2008 17:59 Modified: 2 Apr 2008 19:31
Reporter: David Shrewsbury Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.0, 5.1 OS:Linux
Assigned to: Tomas Ulin CPU Architecture:Any

[31 Jan 2008 17:59] David Shrewsbury
Description:
In a Cluster with 2 node groups (0 and 1), if a data node in group 0 is placed in the "not started" state (RESTART -n), then you are not allowed to STOP another data node in group 1. You can, however, RESTART a group 1 node.

Tested on versions 5.1.22 and 5.0.54a.

How to repeat:
shell# ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=2    @127.0.0.1  (Version: 5.1.22, Nodegroup: 0, Master)
id=3    @127.0.0.1  (Version: 5.1.22, Nodegroup: 0)
id=4    @127.0.0.1  (Version: 5.1.22, Nodegroup: 1)
id=5    @127.0.0.1  (Version: 5.1.22, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (Version: 5.1.22)

[mysqld(API)]   3 node(s)
id=20 (not connected, accepting connect from 10.0.1.20)
id=21 (not connected, accepting connect from 10.0.1.30)
id=22 (not connected, accepting connect from any host)

shell# ndb_mgm -e "2 restart -n"
Connected to Management Server at: localhost:1186
Node 2 is being restarted

shell# ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=2    @127.0.0.1  (Version: 5.1.22, not started)
id=3    @127.0.0.1  (Version: 5.1.22, Nodegroup: 0)
id=4    @127.0.0.1  (Version: 5.1.22, Nodegroup: 1)
id=5    @127.0.0.1  (Version: 5.1.22, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (Version: 5.1.22)

[mysqld(API)]   3 node(s)
id=20 (not connected, accepting connect from 10.0.1.20)
id=21 (not connected, accepting connect from 10.0.1.30)
id=22 (not connected, accepting connect from any host)

shell# ndb_mgm -e "4 stop"
Connected to Management Server at: localhost:1186
Shutdown failed.
*  2002: Stop failed
*        Operation not allowed while nodes are starting or stopping.: Permanent error: Application error

shell# ndb_mgm -e "4 restart"
Connected to Management Server at: localhost:1186
Shutting down nodes with "-n, no start" option, to subsequently start the nodes.
Node 4 is being restarted
[13 Mar 2008 14:25] Tomas Ulin
1. This is not a regression, it has been like this since 2005.
2. there is a workaround which is to use "4 stop -a"
[13 Mar 2008 15:46] Tomas Ulin
patch

Attachment: tmp.patch (text/x-patch), 3.14 KiB.

[13 Mar 2008 15:49] Tomas Ulin
Patch:

1. make sure you can stop when node in SL_CMVMI (adresses bug as such)
2. this however increases probability of hitting bug  Bug #13461 Slave Cluster crashed on restart of two data nodes in separate groups
3. so adding code in restart node to "make sure" node is not stopping while restarting, and wait for any stopping nodes, before starting them again
4. also Bug #13461 was present in restart node as well so added that bugfix there as well
[13 Mar 2008 19:39] Jonas Oreland
comments on patch:
1) why dont you put the loop-check into a (static) subroutine?
   (it's non-trivial and repeated in 3 places)
2) should you really retry *for ever* (in start)

comment on triage: ok regression since 2005 decreases impact to I4
[14 Mar 2008 13:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43998

ChangeSet@1.2539, 2008-03-14 14:02:27+01:00, tomas@poseidon.ndb.mysql.com +2 -0
  Bug #34201 Unable stop a node when a node in a different group is in "not started" state
[2 Apr 2008 19:31] Jon Stephens
Documented in the 5.1.23-ndb-6.3.11 changelog as follows:

        If a data node in one node group was placed in the not started state
        (using node_id RESTART -n), it was not possible to stop a data node in
        a different node group.

Left in Patch Pending state pending further merges.
[4 Apr 2008 22:34] Jon Stephens
Also noted in the 5.1.23-ndb-6.2.15 changelog.
[24 Jun 2008 13:30] Jon Stephens
For MySQL Cluster NDB 6.2, the fix actually first appears in 6.2.16, not 6.2.15.
[12 Dec 2008 23:29] Bugs System
Pushed into 6.0.6-alpha  (revid:sp1r-tomas@poseidon.ndb.mysql.com-20080314130227-25803) (version source revid:sp1r-tomas@poseidon.ndb.mysql.com-20080516085603-30848) (pib:5)