Bug #19645 Data Node hangs in phase 100
Submitted: 9 May 2006 17:26 Modified: 25 Jan 2007 4:07
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.1.11 OS:Linux (Linux 32 Bit OS)
Assigned to: Jonas Oreland CPU Architecture:Any

[9 May 2006 17:26] Jonathan Miller
Description:
I had run DBT2 for a 4 data node cluster and decided to lower the value for TransactionDeadlockDetectionTimeout from 2500 to 900. I updated the config.ini file and logged into the management client and did the following steps.

ndb_mgm> 1 stop
Connected to Management Server at: HOST:14000
Node 1 has shutdown.

$ ndb_mgmd -f ./config.ini

ndb_mgm> 2 restart
Connected to Management Server at: XXXX:14000
Node 2: Node shutdown initiated
Node 2: Node shutdown completed, restarting, no start.
Node 2 is being restarted

ndb_mgm> Node 2: Start initiated (version 5.1.10)
Node 2: Started (version 5.1.10)

ndb_mgm> 3 restart
Node 3: Node shutdown initiated
Node 3: Node shutdown completed, restarting, no start.
Node 3 is being restarted

ndb_mgm> Node 3: Started (version 5.1.10)

ndb_mgm> 4 restart
Node 4: Node shutdown initiated
Node 4: Node shutdown completed, restarting, no start.
Node 4 is being restarted

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=2    @XX.CCCC.1.94  (Version: 5.1.10, Nodegroup: 0)
id=3    @XX.CCCC.1.96  (Version: 5.1.10, Nodegroup: 0)
id=4    @XX.CCCC.1.92  (Version: 5.1.10, starting, Nodegroup: 1)
id=5    @XX.CCCC.1.97  (Version: 5.1.10, Nodegroup: 1, Master)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @XX.CCCC.1.92  (Version: 5.1.10)

[mysqld(API)]   21 node(s)
id=6    @XX.CCCC.1.92  (Version: 5.1.10)
id=7    @XX.CCCC.1.93  (Version: 5.1.10)
id=8 (not connected, accepting connect from any host)
id=9 (not connected, accepting connect from any host)
id=10 (not connected, accepting connect from any host)
id=11 (not connected, accepting connect from any host)
id=12 (not connected, accepting connect from any host)
id=13 (not connected, accepting connect from any host)
id=14 (not connected, accepting connect from any host)
id=15 (not connected, accepting connect from any host)
id=16 (not connected, accepting connect from any host)
id=17 (not connected, accepting connect from any host)
id=18 (not connected, accepting connect from any host)
id=19 (not connected, accepting connect from any host)
id=20 (not connected, accepting connect from any host)
id=21 (not connected, accepting connect from any host)
id=22 (not connected, accepting connect from any host)
id=23 (not connected, accepting connect from any host)
id=24 (not connected, accepting connect from any host)
id=25 (not connected, accepting connect from any host)
id=26 (not connected, accepting connect from any host)

ndb_mgm> 4 status
Node 4: starting (Phase 100) (Version 5.1.10)

ndb_mgm> 4 status
Node 4: starting (Phase 100) (Version 5.1.10)

ndb_mgm> 4 status
Node 4: starting (Phase 100) (Version 5.1.10)

ndb_mgm> 4 status
Node 4: starting (Phase 100) (Version 5.1.10)

ndb_mgm> 4 stop
Shutdown failed.
*  2002: Stop failed
*        Operation not allowed while nodes are starting or stopping.

ndb_mgm> 4 status
Node 4: starting (Phase 100) (Version 5.1.10)

ndb_mgm> 4 restart
Node 4: Node shutdown completed, restarting, no start.
Node 4 is being restarted

ndb_mgm> Node 4: Start initiated (version 5.1.10)

ndb_mgm> 4 status
Node 4: starting (Phase 4) (Version 5.1.10)

ndb_mgm> 4 status
Node 4: starting (Phase 4) (Version 5.1.10)

ndb_mgm> 4 status
Node 4: starting (Phase 100) (Version 5.1.10)

Did a kill -6 and restart. Still gets caught in 100.
Did another kill and restarted with --initial. DN still getting caught in phase 100.

How to repeat:
See Above

Suggested fix:
1) Not getting stuck in phase 100
2) Should be able to do "ID stop" no matter what state it is in.
[20 Jun 2006 15:51] Tomas Ulin
changed title as hang has nothing to do with the config change as such
[5 Jul 2006 13:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8758
[5 Jul 2006 13:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8764
[8 Jul 2006 7:29] Jon Stephens
Documented bugfix in 5.1.12 changelog.
[16 Jan 2007 21:03] Jonas Oreland
Interestingly enough...found that bugfix isnt working 100% while retesting this...
reopening and hijacking
[19 Jan 2007 16:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/18448

ChangeSet@1.2370, 2007-01-19 17:01:52+01:00, jonas@perch.ndb.mysql.com +2 -0
  ndb - bug#19645
    fix some more sp100 hang cases
[19 Jan 2007 17:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/18459

ChangeSet@1.2091, 2007-01-19 18:35:27+01:00, jonas@perch.ndb.mysql.com +5 -0
  ndb - bug#19645
    fix bug with hanging node in sp100
    (mysql-5.1-wl2325-5.0)
[23 Jan 2007 15:59] Jonas Oreland
pushed to 5.1-ndb and 5.1-telco
[24 Jan 2007 1:57] Tomas Ulin
pushed to 5.1.15
[25 Jan 2007 4:07] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Moved bugfix report from 5.1.12 changelog to 5.1.15 changelog.