Bug #42056 ndb_mgmd hang on STOPing the management node
Submitted: 12 Jan 2009 16:42 Modified: 16 Jan 2009 20:12
Reporter: Hartmut Holzgraefe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:6.3.19,6.4.0 OS:Linux
Assigned to: Magnus Blåudd CPU Architecture:Any

[12 Jan 2009 16:42] Hartmut Holzgraefe
Description:
Starting with ndb-6.3.19 a management node can't be stopped via the ndb_mgm clients STOP command anymore.

In ndb-6.3.19 and 6.3.20 both client and management server hang,
but after terminating the client with CTRL-C the management server
terminates within 15s max. (which is the same as with pre 6.3.19
STOPs)

In ndb-6.4.0 the management server process stays around even after
stopping the client with CTRL-C, so the situation is even more severe here.

How to repeat:
Try to stop the management server using "id STOP" from within the management client.

Suggested fix:
?
[14 Jan 2009 17:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63225

3215 Magnus Svensson	2009-01-14
      Bug#42056 ndb_mgmd hang on STOPing the management node
       - Calculate difference between start and curr time in a unsigned safe way
         so that wait loop terminates properly if all nodes has not reached the given
         state in time.
       - Improve printouts and send them to log instead of stdout
       - Check if session thread should exit becuase it has been stopped
         also in the case when for eaxmple a read timeout occurs. The default
         calue of read timeout is 30 seconds, so it should be expected that a shutdown
         of ndb_mgmd may take 30 seconds to complete.
[14 Jan 2009 17:47] Magnus Blåudd
The ndb_mgm problem is a duplicate of BUG#40922 which just hase been fixed. With that patch ndb_mgmd will shutdown properly in 6.3

6.4 needs the patch for this bug to shutdown properly.

The ndb_mgmd waits for all of its thread to notice it want to shutdown before continuing with the shutdown. There is currently nothing forcing the ndb_mgmd down until that has occured. Normally each thread is reading from its client and waking up (at least) each 30 seconds when the read timeout occurs. It will then checked it the thread should return and disconnect any connected clients(for example another ndb_mgm or mysqld etc).

We should maybe change the loop that wait for all threads to exit so that it continues anyway after 2 * read timeout.
[14 Jan 2009 22:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63265

3216 Magnus Svensson	2009-01-14
      Bug#42056 ndb_mgmd hang on STOPing the management node - part2
       - Continue sthutdown of ndb_mgmd even if not all threads/sessions
         has stopped within 2*MgmApiSession::SOCKET_TIMEOUT(currently 30 seconds)
       - Print a message if not all sessions has stopped and shutdown continues anyway
       - Add constant for SOCKET_TIMEOUT to used when a MgmApiSession read or write
         to the client, as well as when shutting down all sessions.
[15 Jan 2009 9:28] Bugs System
Pushed into 5.1.30-ndb-6.4.1 (revid:msvensson@mysql.com-20090115092607-p44knf2evmkfncj7) (version source revid:msvensson@mysql.com-20090115092607-p44knf2evmkfncj7) (merge vers: 5.1.30-ndb-6.4.1) (pib:6)
[15 Jan 2009 15:43] Jon Stephens
Documented in the NDB-6.4.1 changelog as follows:

        The management server would hang after attempting to halt it
        with the STOP command in the management client.

Set status back to PQ awaiting telco-6.3 merge.
[16 Jan 2009 14:34] Magnus Blåudd
There is no patch for 6.3, that was fixed by the bug I mentioned previously.
[16 Jan 2009 20:12] Jon Stephens
Noted in changelog entry that this bug is related to Bug #40922. Closed.