MySQL Bugs: #13070: Multiple managment servers do not list other connected management servers

Bug #13070	Multiple managment servers do not list other connected management servers
Submitted:	8 Sep 2005 17:27	Modified:	29 Sep 2005 22:35
Reporter:	Peter Hsu	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Documentation	Severity:	S3 (Non-critical)
Version:	4.1.14,5.0.12	OS:	Linux (Redhat 9.0, Debian)
Assigned to:	Jon Stephens	CPU Architecture:	Any

Description:
When setting up multiple management nodes in the mysql cluster, the managment nodes can't seem to "connect" to each other. 

When I set up two management nodes, I have the exact same config.ini used by both servers.  In the ini, I have two entries for [NDB_MGMD], one for each management server.  One is running at 10.0.0.51 and one at 10.0.0.52.  When I start the ndb_mgmd daemon, the management servers start up, but they don't seem to "see" each other.

When I do a "show" from the management client console, it shows the following

on 10.0.0.51:

[ndb_mgmd(MGM)] 2 node(s)

id=1    @10.0.0.51  (Version: 4.1.12)
id=2 (not connected, accepting connect from 10.0.0.52)

on 10.0.0.52:

[ndb_mgmd(MGM)] 2 node(s)
id=1 (not connected, accepting connect from 10.0.0.51)
id=2    @10.0.0.52  (Version: 4.1.12)

I then tried setting the connectstring when starting each managment server.  So I set the connectstrings to be:
nodeid=1,10.0.0.51:1186,10.0.0.52:1186
and
nodeid=2,10.0.0.51:1186,10.0.0.52:1186
respectively.

When I do this, I still get the same output when I do a "show" in each console.  

The managment servers are able to see each other.  There are no firewalls in between.  The management client can connect to the other management server without any problems.

How to repeat:
Set up a 2 or more managment server configuration in a MySQL cluster.

config.ini:

[NDBD DEFAULT]    # Options affecting ndbd processes on all data nodes
NoOfReplicas=2    # Number of replicas
DataDir=/var/log/mysql-cluster
MaxNoOfAttributes = 2000
MaxNoOfOrderedIndexes = 5000
MaxNoOfUniqueHashIndexes = 5000
MaxNoOfOpenFiles = 150

[TCP DEFAULT]     # TCP/IP options:

[NDB_MGMD]
HostName=10.0.0.51              # Hostname or IP address of MGM node
DataDir=/var/log/mysql-cluster  # Directory for MGM node logfiles

[NDB_MGMD]
HostName=10.0.0.52              # Hostname or IP address of MGM node
DataDir=/var/log/mysql-cluster  # Directory for MGM node logfiles

[NDBD]
Id=3                            # Node ID
HostName=10.0.0.11              # Hostname or IP address
DataDir=/usr/local/mysql/data   # Directory for this data node's datafiles

[NDBD]
Id=4                            # Node ID
HostName=10.0.0.12              # Hostname or IP address
DataDir=/usr/local/mysql/data   # Directory for this data node's datafiles

[MYSQLD]
Id=5                            # Node ID
HostName=10.0.0.21              # Hostname or IP address

[MYSQLD]
Id=6                            # Node ID
HostName=10.0.0.22              # Hostname or IP address

Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.mysql.com/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to 'Open'.

Thank you for your interest in MySQL.

Additional info:

Hi Peter,

thank you for your bug report. Please send us the exact commands you use for starting the management servers.

The management servers were started both without any extra command line parameters and with the connect-strings explicitly set.

on both 10.0.0.51 and 10.0.0.52
/usr/sbin/ndb_mgmd

AND

on 10.0.0.51:
/usr/sbin/ndb_mgmd --connect-string=nodeid=1,10.0.0.51:1186,10.0.0.52:1186

on 10.0.0.52:
/usr/sbin/ndb_mgmd --connect-string=nodeid=2,10.0.0.51:1186,10.0.0.52:1186

Both methods resulted in the same error in the management client display.

The bug seems to be only with the display in the management client.  It doesn't seem to affect the cluster itself.  When the cluster is started, the management server logs indicate that all cluster nodes are connecting to both servers.

Peter, you were passing the file location right? (i.e. ndb_mgmd -f ./PathToConfigFile/config.ini)

Some tests with MySQL Cluster 5.0.11 on Debian have shown that you need to 
1. have the same config.ini
2. all nodes need to use the connect string including both mgm nodes
3. all nodes have to be restarted (ndbd nodes can be restarted one after another)

The show command does not show both management servers if you have not restarted the data nodes.

Have you restarted the data nodes in your environment?

Sorry, yes, I did include the path to the config.ini file with -f.  

They both use the same config.ini file listed.

I'm not sure what the data nodes have to do with the bug.  The management nodes will only display each other after the data nodes are connected as well?  

I guess I'll have to check the client console after I start the data nodes.

You are right that the management nodes should know each other as soon as they both have been started with the new connect string that includes both nodes.
Please provide us the information about what the "show" statement shows when you have restarted the storage nodes. We need to know if you see the same behavior to have an exact description for this bug. I will change the status to verified then.
You need to restart the data nodes anyway. They need to connect to the new management server that this management server knows them and there status.

After the data nodes are started, the management console shows the connected management servers. 

On 10.0.0.51:

Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=3    @10.0.0.11  (Version: 4.1.14, Nodegroup: 0, Master)
id=4    @10.0.0.12  (Version: 4.1.14, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)
id=1    @10.0.0.52  (Version: 4.1.14)
id=2   (Version: 4.1.14)

[mysqld(API)]   2 node(s)
id=5    @10.0.0.11  (Version: 4.1.14)
id=6    @10.0.0.12  (Version: 4.1.14)

The display error only occurs before the data nodes are started, as you suggested.

Changed to verified.

A management server should show all management servers which are defined in the config.ini file as soon as they are also started with the new configuration files and with a connect string including all management servers of the config.ini.

It should not be neccessary to restart the data notes here. The documentation should include a note that the real status of a data node can only be displayed from a management server when a data note has connected himself to this management server.

The output of the show command is showing what nodes are connected to the db nodes (ndbd).

The way it works is this:
foreach node
  ask a db node if it is connected and its status

(the heartbeat mechanism keeps all this information up to date)

when you have started just the management servers, all the secondary ones (not the first one in the connect string) ask the first one to allocate them a node id. They may also fetch the configuration. At this point there are no heartbeats (and certainly no db nodes to talk to).

So there is no reliable way to get information about what nodes are alive. We could give back information on who we've allocated node ids to, but since the allocation of node id requests only go to one mgm server, if you query any of the secondaries, you'll get an incorrect result. We may change this in 5.1 (as there may be support for synchronous updates to configuration across multiple management servers).

We should make it clear in the manual that the 'show' command lists nodes that are connected to the cluster (where cluster means db nodes).

Also see BUG#11595 and BUG#12037.

Thank you for your bug report. This issue has been addressed in the
documentation. The updated documentation will appear on our website
shortly, and will be included in the next release of the relevant
product(s).

Additional info:

Updated Cluster Limitations section, added note to SHOW command description.