Bug #42015 ndb_mgmd --initial + no config file = hang
Submitted: 10 Jan 2009 15:35 Modified: 28 Sep 2009 14:29
Reporter: Jon Stephens Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:Any
Assigned to: Magnus Blåudd CPU Architecture:Any
Tags: --config-file, --initial, 5.1.30-ndb-6.4.1-bzr, ndb_mgmd

[10 Jan 2009 15:35] Jon Stephens
Description:
If you start the management server with the --initial option and it cannot find a configuration file, it deletes all cache files and then hangs.

Same thing happens if there's no cache and no config.ini; the management server just waits forever AFAICT.

How to repeat:
jon@tonfisk:~/bin/mysql-5.1-telco-6.4/libexec> ./ndb_mgmd --initial
2009-01-10 15:57:39 [MgmSrvr] INFO     -- NDB Cluster Management Server. mysql-5.1.30 ndb-6.4.1-beta
2009-01-10 15:57:39 [MgmSrvr] INFO     -- Trying to get configuration from other mgmd(s) using 'nodeid=0,localhost:1186'...

(Ten minutes later...)
<^C>
jon@tonfisk:~/bin/mysql-5.1-telco-6.4/libexec>

Suggested fix:
If the management server isn't going to check for a config.ini before deleting the cache files, then there should be a timeout - IMO ten seconds is more than enough time - after which it should exit with a "No configuration found" error message.

However, this is sort of a nasty trick to play on users, and a better solution might be for ndb_mgmd to check for a config.ini *before* wiping out the cache, and if none is found, exiting instead.

In any case, it shouldn't just do the sit-and-spin bit forever...
[23 Sep 2009 15:13] Magnus Blåudd
Wrote a patch that makes a ndbg_mgmd that is trying to connect to another ndb_mgmd print out a message about this approx. every 30 second.
[25 Sep 2009 8:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84586
[28 Sep 2009 12:31] Magnus Blåudd
Pushed to 7.0 and 7.1
[28 Sep 2009 14:29] Jon Stephens
Documented bugfix in the NDB-7.0.8 changelog as follows:

        When started with the --initial and --reload options, if
        ndb_mgmd could not find a configuration file or connect to
        another management server, it appeared to hang. Now, when trying
        to fetch its configuration from another management node,
        ndb_mgmd checks and signals (Trying to get configuration from
        other mgmd(s)) each 30 seconds that it has not yet done so.

Closed.
[30 Sep 2009 8:14] Bugs System
Pushed into 5.1.37-ndb-7.0.9 (revid:jonas@mysql.com-20090930075942-1q6asjcp0gaeynmj) (version source revid:magnus.blaudd@sun.com-20090925104247-ozlmf4vu1f3936am) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[30 Sep 2009 8:15] Bugs System
Pushed into 5.1.35-ndb-7.1.0 (revid:jonas@mysql.com-20090930080049-1c8a8cio9qgvhq35) (version source revid:jonas@mysql.com-20090925143824-3i5kcvsf8v3yf79j) (merge vers: 5.1.35-ndb-7.1.0) (pib:11)