MySQL Bugs: #41713: Using two management servers does not work

Bug #41713	Using two management servers does not work
Submitted:	23 Dec 2008 12:57	Modified:	19 Feb 2009 14:09
Reporter:	Johan Andersson	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	6.4	OS:	Any
Assigned to:	Magnus Blåudd	CPU Architecture:	Any
Tags:	ndb_mgmd

Description:
I cannot use two management servers (in a truly distributed setup):
config.ini

[ndb_mgmd]
id=1
hostname=A

[ndb_mgmd]
id=2
hostname=B

[root@ps-ndb01 scripts]# /usr/local/mysql//mysql/bin/ndb_mgmd  --configdir=/tmp/ --config-file=/etc/mysql/config.ini 
2008-12-23 13:55:44 [MgmSrvr] INFO     -- NDB Cluster Management Server. mysql-5.1.30 ndb-6.4.0-beta
2008-12-23 13:55:44 [MgmSrvr] INFO     -- Reading cluster configuration from '/etc/mysql/config.ini'
[root@ps-ndb01 scripts]# ls /tmp/

!!!NO config bin file has been written! And the management server stays up but no nodes can connect (not even ndb_mgm).

Change config and only use one management server:

[ndb_mgmd]
id=1
hostname=A

/usr/local/mysql//mysql/bin/ndb_mgmd  --configdir=/tmp/ --config-file=/etc/mysql/config.ini 
2008-12-23 13:53:51 [MgmSrvr] INFO     -- NDB Cluster Management Server. mysql-5.1.30 ndb-6.4.0-beta
2008-12-23 13:53:51 [MgmSrvr] INFO     -- Reading cluster configuration from '/etc/mysql/config.ini'
# ls /tmp/
ndb_1_config.bin.1 

So only one management server works

How to repeat:
see above

Until the two ndb_mgmd's has completed an "initial config change" no node can fetch the configuration. It seems like the ndb_mgmds are hanging in state=INITIAL waiting for something.

Two possible workarounds:
1. Wait more than 60 seconds and the ndb_mgmd will start the initial config change.
2. Start the ndb_mgmd with lowest nodeid last. This seems to trigger the config change immediately.

Check the logs from ndb_mgmd, they will print which state the ndbd_mgmd are in.

Filed Bug#41965 to fix this with small patch for 6.2, will commit some additional fixes for 6.4

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62707

3199 Magnus Svensson	2009-01-08
      Bug#41713 Using two management servers does not work

Will try to squeese in a patch so that the error "Failed to convert connection from '<hostname>:<port>' to transporter" never occurs.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62777

3202 Magnus Svensson	2009-01-09
      Bug#41713 Using two management servers does not work
       - Additional patch that open up the MGM to MGm transporters as soon as they have been created.

Pushed into 5.1.30-ndb-6.4.1 (revid:msvensson@mysql.com-20090109121748-t0gy78i6su7ntseu) (version source revid:msvensson@mysql.com-20090109121748-t0gy78i6su7ntseu) (merge vers: 5.1.30-ndb-6.4.1) (pib:6)

Documented bugfix in the NDB-6.4.1 changelog as follows:

        When a data node connects to the management server, the node
        sends its node ID and transporter type; the management server
        then verifies that there is a transporter set up for that node
        and that it is in the correct state, and then sends back an
        acknowledgement to the connecting node. If the transporter was
        not in the correct state, no reply was sent back to the
        connecting node, which would then hang until a read timeout
        occurred (60 seconds). Now, if the transporter is not in the
        correct state, the management server acknowledges this promptly,
        and the node immediately disconnects.