MySQL Bugs: #50717: ndb_mgmd shutdown never completes

Bug #50717	ndb_mgmd shutdown never completes
Submitted:	29 Jan 2010 8:52	Modified:	29 Jan 2010 10:45
Reporter:	Kari Juul Wedde	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	7.0.11	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
The Cluster Manager does not work well with Cluster 7.0.11. You are for instance not able to stop a running Cluster. The data nodes are stopped but not the mgmd. This because the mgmd shutdown never completes.

The problem is easily reproducible without the Cluster Manager. Just start the ndb_mgmd and issue a shutdown from the Management Client. The shutdown never completes. See example below.

The problem is tested with 1249741.mysql-7.0.11-solaris10-x86_64.tar.gz downloaded from pushbuild. 

Example
-------
kw136773@nanna14:~/<1>cluster/mysql-7.0.11-solaris10-x86_64> libexec/ndb_mgmd -f config.ini --ndb-connectstring "nodeid=1;host=nanna14:45000" --configdir=/export/home/tmp/kw136773/ndb2 --initial &
[1] 1171
2010-01-29 09:08:37 [MgmtSrvr] INFO     -- NDB Cluster Management Server. mysql-5.1.41 ndb-7.0.11
2010-01-29 09:08:37 [MgmtSrvr] INFO     -- Reading cluster configuration from 'config.ini'

kw136773@nanna14:~/<1>cluster/mysql-7.0.11-solaris10-x86_64>bin/ndb_mgm --ndb-connectstring "nodeid=1;host=nanna14:45000"
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: nanna14:45000
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2 (not connected, accepting connect from nanna14)
id=3 (not connected, accepting connect from nanna14)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @nanna14  (mysql-5.1.41 ndb-7.0.11)

[mysqld(API)]   1 node(s)
id=4 (not connected, accepting connect from nanna14)

ndb_mgm> shutdown
0 NDB Cluster node(s) have shutdown.
Disconnecting to allow management server to shutdown.
ndb_mgm> exit

[1]  + Done                          libexec/ndb_mgmd -f config.ini --ndb-connectstring nodeid=1;host=nanna14:45000  ...

kw136773@nanna14:~/<1>cluster/mysql-7.0.11-solaris10-x86_64> ps -ukw136773    PID TTY         TIME CMD
..
  1172 ?           0:00 ndb_mgmd
..

How to repeat:
See description

Running on my local machine, I can see how it print the below messages in cluster log and then nothing happens.

2010-01-29 10:36:56 [MgmtSrvr] INFO     -- Id: 3, Command port: *:13000
==CONFIRMED==
2010-01-29 10:37:19 [MgmtSrvr] INFO     -- Shutting down server...

Problem was a thread owning a lock, exited wo/ releasing lock.
It was introduced by me, and was never part of any "released" version.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/98576

3383 Jonas Oreland	2010-01-29
      ndb - bug#50717 - release TTFM before existing ConfigManager (by deleting SignalSender) to avoid hanging shutdowns

not released anywhere, closing