Bug #41713 Using two management servers does not work
Submitted: 23 Dec 2008 12:57 Modified: 19 Feb 2009 14:09
Reporter: Johan Andersson Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:6.4 OS:Any
Assigned to: Magnus Blåudd CPU Architecture:Any
Tags: ndb_mgmd

[23 Dec 2008 12:57] Johan Andersson
Description:
I cannot use two management servers (in a truly distributed setup):
config.ini

[ndb_mgmd]
id=1
hostname=A

[ndb_mgmd]
id=2
hostname=B

[root@ps-ndb01 scripts]# /usr/local/mysql//mysql/bin/ndb_mgmd  --configdir=/tmp/ --config-file=/etc/mysql/config.ini 
2008-12-23 13:55:44 [MgmSrvr] INFO     -- NDB Cluster Management Server. mysql-5.1.30 ndb-6.4.0-beta
2008-12-23 13:55:44 [MgmSrvr] INFO     -- Reading cluster configuration from '/etc/mysql/config.ini'
[root@ps-ndb01 scripts]# ls /tmp/

!!!NO config bin file has been written! And the management server stays up but no nodes can connect (not even ndb_mgm).

Change config and only use one management server:

[ndb_mgmd]
id=1
hostname=A

/usr/local/mysql//mysql/bin/ndb_mgmd  --configdir=/tmp/ --config-file=/etc/mysql/config.ini 
2008-12-23 13:53:51 [MgmSrvr] INFO     -- NDB Cluster Management Server. mysql-5.1.30 ndb-6.4.0-beta
2008-12-23 13:53:51 [MgmSrvr] INFO     -- Reading cluster configuration from '/etc/mysql/config.ini'
# ls /tmp/
ndb_1_config.bin.1 

So only one management server works

How to repeat:
see above
[30 Dec 2008 11:44] Magnus Blåudd
Until the two ndb_mgmd's has completed an "initial config change" no node can fetch the configuration. It seems like the ndb_mgmds are hanging in state=INITIAL waiting for something.
[30 Dec 2008 12:51] Magnus Blåudd
Two possible workarounds:
1. Wait more than 60 seconds and the ndb_mgmd will start the initial config change.
2. Start the ndb_mgmd with lowest nodeid last. This seems to trigger the config change immediately.

Check the logs from ndb_mgmd, they will print which state the ndbd_mgmd are in.
[8 Jan 2009 15:19] Magnus Blåudd
Filed Bug#41965 to fix this with small patch for 6.2, will commit some additional fixes for 6.4
[8 Jan 2009 15:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62707

3199 Magnus Svensson	2009-01-08
      Bug#41713 Using two management servers does not work
[8 Jan 2009 15:58] Magnus Blåudd
Will try to squeese in a patch so that the error "Failed to convert connection from '<hostname>:<port>' to transporter" never occurs.
[9 Jan 2009 9:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62777

3202 Magnus Svensson	2009-01-09
      Bug#41713 Using two management servers does not work
       - Additional patch that open up the MGM to MGm transporters as soon as they have been created.
[9 Jan 2009 12:30] Bugs System
Pushed into 5.1.30-ndb-6.4.1 (revid:msvensson@mysql.com-20090109121748-t0gy78i6su7ntseu) (version source revid:msvensson@mysql.com-20090109121748-t0gy78i6su7ntseu) (merge vers: 5.1.30-ndb-6.4.1) (pib:6)
[19 Feb 2009 14:09] Jon Stephens
Documented bugfix in the NDB-6.4.1 changelog as follows:

        When a data node connects to the management server, the node
        sends its node ID and transporter type; the management server
        then verifies that there is a transporter set up for that node
        and that it is in the correct state, and then sends back an
        acknowledgement to the connecting node. If the transporter was
        not in the correct state, no reply was sent back to the
        connecting node, which would then hang until a read timeout
        occurred (60 seconds). Now, if the transporter is not in the
        correct state, the management server acknowledges this promptly,
        and the node immediately disconnects.