Bug #36188 Multiple Managment Nodes with same Node ID causes different issues.
Submitted: 17 Apr 2008 22:12 Modified: 17 Apr 2008 22:13
Reporter: Jonathan Miller Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:Linux
Assigned to: CPU Architecture:Any

[17 Apr 2008 22:12] Jonathan Miller
Description:
When you start more then one ndb_mgmd and each config.ini has each set to the same ID the following happens.

The first one started will actually act as the manager and receive the nodes and all API's that attach. It does not show the other manager as expected.

On the second manager, nothing shows connected, yet if you look at the log, it does show that the nodes have connected:

2008-04-17 23:24:20 [MgmSrvr] INFO     -- Id: 1, Command port: 14000
2008-04-17 23:26:12 [MgmSrvr] INFO     -- Node 1: Node 2 Connected
2008-04-17 23:26:18 [MgmSrvr] INFO     -- Node 1: Node 3 Connected
2008-04-17 23:29:00 [MgmSrvr] ALERT    -- Node 1: Node 2 Disconnected

Now if you try to issue a shutdown from the second manager it attempts to shutdown, but fails, and then you are no long able to talk to that manager without exiting and restarting the management client.

out.log shows:
Id: 1, Command port: 14000
asked to stop 9
asked to stop 9
asked to stop 9

ndb_mgm> shutdown
Shutdown of NDB Cluster node(s) failed.
*   110: Error
*        Time out talking to management server
ndb_mgm> show
Could not get status
*  1010: Management server not connected
*
ndb_mgm> show
Could not get status
*  1010: Management server not connected
*
ndb_mgm> exit
ndbdev@ndbXX:/space/run> ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2 (not connected, accepting connect from ndbXX)
id=3 (not connected, accepting connect from ndbXX)

[ndb_mgmd(MGM)] 2 node(s)
id=1    @XXX15  (Version: 5.1.25)
id=9 (not connected, accepting connect from ndbXX)

[mysqld(API)]   2 node(s)
id=4 (not connected, accepting connect from ndbXX)
id=5 (not connected, accepting connect from ndbXX)

Now if for some reason the first manager dies and one of the data node dies, the second manager acts as arbitrator and allows the second data node to survive, yet those actions do not get logged by the second manager and the data node still does not show up in the list when a "show" command was issued.

 

How to repeat:
Create a configuration file with 2 managers

mgm1
[NDB_MGMD]
Id: 9
HostName: host13
ArbitrationRank: 1

[NDB_MGMD]
Id: 1
HostName: host15
ArbitrationRank: 1

mgm#2

[NDB_MGMD]
Id: 1
HostName: host13
ArbitrationRank: 1

[NDB_MGMD]
Id: 9
HostName: host15
ArbitrationRank: 1

Start mgm #1 and the 2 DN, then start mgm#2

Suggested fix:
MGM#2 should know that there is a manager already running with that ID and refuse to start.

In addition, I think it would be a great idea if you could start additional ndb_mgmd's telling them the other ndb_mgmd's connection string and have them get the configuration information from that manager and write their own configuration file.

Example

mgm#1

ndb_mgmd -f config.ini

mgm#2

ndb_mgmd --manger=mgm#1:port

mgm#3

ndb_mgmd --manger=mgm#1:port

both #2 and #3 call to #1, pull the configuration and then writes it to file.
[30 Apr 2009 13:25] Jonathan Miller
Workaround it to make sure each has it's own ID