Bug #50717 ndb_mgmd shutdown never completes
Submitted: 29 Jan 2010 8:52 Modified: 29 Jan 2010 10:45
Reporter: Kari Juul Wedde Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:7.0.11 OS:Any
Assigned to: Jonas Oreland CPU Architecture:Any

[29 Jan 2010 8:52] Kari Juul Wedde
Description:
The Cluster Manager does not work well with Cluster 7.0.11. You are for instance not able to stop a running Cluster. The data nodes are stopped but not the mgmd. This because the mgmd shutdown never completes.

The problem is easily reproducible without the Cluster Manager. Just start the ndb_mgmd and issue a shutdown from the Management Client. The shutdown never completes. See example below.

The problem is tested with 1249741.mysql-7.0.11-solaris10-x86_64.tar.gz downloaded from pushbuild. 

Example
-------
kw136773@nanna14:~/<1>cluster/mysql-7.0.11-solaris10-x86_64> libexec/ndb_mgmd -f config.ini --ndb-connectstring "nodeid=1;host=nanna14:45000" --configdir=/export/home/tmp/kw136773/ndb2 --initial &
[1] 1171
2010-01-29 09:08:37 [MgmtSrvr] INFO     -- NDB Cluster Management Server. mysql-5.1.41 ndb-7.0.11
2010-01-29 09:08:37 [MgmtSrvr] INFO     -- Reading cluster configuration from 'config.ini'

kw136773@nanna14:~/<1>cluster/mysql-7.0.11-solaris10-x86_64>bin/ndb_mgm --ndb-connectstring "nodeid=1;host=nanna14:45000"
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: nanna14:45000
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2 (not connected, accepting connect from nanna14)
id=3 (not connected, accepting connect from nanna14)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @nanna14  (mysql-5.1.41 ndb-7.0.11)

[mysqld(API)]   1 node(s)
id=4 (not connected, accepting connect from nanna14)

ndb_mgm> shutdown
0 NDB Cluster node(s) have shutdown.
Disconnecting to allow management server to shutdown.
ndb_mgm> exit

[1]  + Done                          libexec/ndb_mgmd -f config.ini --ndb-connectstring nodeid=1;host=nanna14:45000  ...

kw136773@nanna14:~/<1>cluster/mysql-7.0.11-solaris10-x86_64> ps -ukw136773    PID TTY         TIME CMD
..
  1172 ?           0:00 ndb_mgmd
..

How to repeat:
See description
[29 Jan 2010 9:39] Magnus BlÄudd
Running on my local machine, I can see how it print the below messages in cluster log and then nothing happens.

2010-01-29 10:36:56 [MgmtSrvr] INFO     -- Id: 3, Command port: *:13000
==CONFIRMED==
2010-01-29 10:37:19 [MgmtSrvr] INFO     -- Shutting down server...
[29 Jan 2010 10:33] Jonas Oreland
Problem was a thread owning a lock, exited wo/ releasing lock.
It was introduced by me, and was never part of any "released" version.
[29 Jan 2010 10:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/98576

3383 Jonas Oreland	2010-01-29
      ndb - bug#50717 - release TTFM before existing ConfigManager (by deleting SignalSender) to avoid hanging shutdowns
[29 Jan 2010 10:45] Jonas Oreland
not released anywhere, closing