Bug #34201 Unable stop a node when a node in a different group is in "not started" state
Submitted: 31 Jan 2008 18:59 Modified: 2 Apr 2008 21:31
Reporter: David Shrewsbury
Status: Closed
Category:Server: Cluster Severity:S2 (Serious)
Version:5.0, 5.1 OS:Linux
Assigned to: Tomas Ulin Target Version:5.0+
Triage: D2 (Serious) / R2 (Low) / E2 (Low)

[31 Jan 2008 18:59] David Shrewsbury
Description:
In a Cluster with 2 node groups (0 and 1), if a data node in group 0 is placed in the "not
started" state (RESTART -n), then you are not allowed to STOP another data node in group
1. You can, however, RESTART a group 1 node.

Tested on versions 5.1.22 and 5.0.54a.

How to repeat:
shell# ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=2    @127.0.0.1  (Version: 5.1.22, Nodegroup: 0, Master)
id=3    @127.0.0.1  (Version: 5.1.22, Nodegroup: 0)
id=4    @127.0.0.1  (Version: 5.1.22, Nodegroup: 1)
id=5    @127.0.0.1  (Version: 5.1.22, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (Version: 5.1.22)

[mysqld(API)]   3 node(s)
id=20 (not connected, accepting connect from 10.0.1.20)
id=21 (not connected, accepting connect from 10.0.1.30)
id=22 (not connected, accepting connect from any host)

shell# ndb_mgm -e "2 restart -n"
Connected to Management Server at: localhost:1186
Node 2 is being restarted

shell# ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=2    @127.0.0.1  (Version: 5.1.22, not started)
id=3    @127.0.0.1  (Version: 5.1.22, Nodegroup: 0)
id=4    @127.0.0.1  (Version: 5.1.22, Nodegroup: 1)
id=5    @127.0.0.1  (Version: 5.1.22, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (Version: 5.1.22)

[mysqld(API)]   3 node(s)
id=20 (not connected, accepting connect from 10.0.1.20)
id=21 (not connected, accepting connect from 10.0.1.30)
id=22 (not connected, accepting connect from any host)

shell# ndb_mgm -e "4 stop"
Connected to Management Server at: localhost:1186
Shutdown failed.
*  2002: Stop failed
*        Operation not allowed while nodes are starting or stopping.: Permanent error:
Application error

shell# ndb_mgm -e "4 restart"
Connected to Management Server at: localhost:1186
Shutting down nodes with "-n, no start" option, to subsequently start the nodes.
Node 4 is being restarted
[13 Mar 2008 15:25] Tomas Ulin
1. This is not a regression, it has been like this since 2005.
2. there is a workaround which is to use "4 stop -a"
[13 Mar 2008 16:46] Tomas Ulin
patch

Attachment: tmp.patch (text/x-patch), 3.14 KiB.

[13 Mar 2008 16:49] Tomas Ulin
Patch:

1. make sure you can stop when node in SL_CMVMI (adresses bug as such)
2. this however increases probability of hitting bug  Bug #13461 Slave Cluster crashed on
restart of two data nodes in separate groups
3. so adding code in restart node to "make sure" node is not stopping while restarting,
and wait for any stopping nodes, before starting them again
4. also Bug #13461 was present in restart node as well so added that bugfix there as
well
[13 Mar 2008 20:39] Jonas Oreland
comments on patch:
1) why dont you put the loop-check into a (static) subroutine?
   (it's non-trivial and repeated in 3 places)
2) should you really retry *for ever* (in start)

comment on triage: ok regression since 2005 decreases impact to I4
[14 Mar 2008 14:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43998

ChangeSet@1.2539, 2008-03-14 14:02:27+01:00, tomas@poseidon.ndb.mysql.com +2 -0
  Bug #34201 Unable stop a node when a node in a different group is in "not started" state
[2 Apr 2008 21:31] Jon Stephens
Documented in the 5.1.23-ndb-6.3.11 changelog as follows:

        If a data node in one node group was placed in the not started state
        (using node_id RESTART -n), it was not possible to stop a data node in
        a different node group.

Left in Patch Pending state pending further merges.
[5 Apr 2008 0:34] Jon Stephens
Also noted in the 5.1.23-ndb-6.2.15 changelog.
[24 Jun 2008 15:30] Jon Stephens
For MySQL Cluster NDB 6.2, the fix actually first appears in 6.2.16, not 6.2.15.
[13 Dec 2008 0:29] Bugs System
Pushed into 6.0.6-alpha  (revid:sp1r-tomas@poseidon.ndb.mysql.com-20080314130227-25803)
(version source revid:sp1r-tomas@poseidon.ndb.mysql.com-20080516085603-30848) (pib:5)