MySQL Bugs: #22195: No way to bind ndbd or ndb

Bug #22195	No way to bind ndbd or ndb_mgmd to a specific ip address
Submitted:	9 Sep 2006 14:57	Modified:	30 Jul 2008 9:16
Reporter:	Marc - A. Dahlhaus	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S4 (Feature request)
Version:	5.0.X and 5.1.X	OS:	Any (All)
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
There is currently no way to bind a starting ndbd process to a specific ip address or interface.

I think this is a serious design flaw because there is no way to use the available network bandwidth on multiple network interfaces on the same host.

How to repeat:
just try to bind two ndbd's to different network interfaces on the same host

the only way to get this to work currently is to move the route to the network from the interface associated to the first ndbd's ip address after it is started to the other interface and start the second ndbd.
it's a configuration-nightmare and to script this process for a host restart is really ugly.

Suggested fix:
add a bind-address variable to ndbd and ndb_mgmd and let the daemons use it like the one used for mysqld

on the other hand, the ip addresses are already defined inside the cluster-configuration-file and the ndb_mgmd knows the address for all ndbd-nodes. Wouldn't it be simpler (for configuration , not implementation) to just accept the connection and dictate the ndbd-node to use the associated address from the configuration file and just kill the ndbd if the address isn't usable on the system after the connection was attempted?

Thank you for a problem report. Looks like a reasonable feature request for me.

Hi,

I attached a patch against 5.0
It add the "--bind-address" to ndbd.
Please try it and let me know it goes.

Also, I didnt really understand what why you want
  to use specific address for ndb_mgmd, 
  please explain.

/Jonas

cd ndb && patch -p0 < bind.patch

Attachment: bind.patch (text/x-patch), 10.84 KiB.

As with ndbd i thought the side effect of binding a process to an interface and restrict the use of interfaces where ndb_mgmd shouldn't be visible or usable would be a benefit.

On the other hand the parameter would be consistent with all daemons of the mysql cluster.

I'll report my hopefully patched ndbd successstory after the cluster restart scheduled on 00:00 GMT+1 tonight.

Node startup "ndbd --bind-address=IP --ndb-nodeid=ID --initial" works like a charm. No problems so far.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/11822

ChangeSet@1.2245, 2006-09-13 10:09:23+02:00, jonas@perch.ndb.mysql.com +7 -0
  ndb - bug#22195
    allow bind address for ndbd

Ok,

I'll go ahead and push this into next release.
Please reopen bug if you experience any problems.

/Jonas

Is it also going into the 5.1.X branch?

Marc

yes

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/11930

ChangeSet@1.2246, 2006-09-14 11:57:15+02:00, jonas@perch.ndb.mysql.com +1 -0
  ndb - bug#22195
    also bind client to local host name if specified

I found a problem after an rolling restart of the ndb_mgmd process.
An "ndb_mgm -e show" doesn't list all nodes and lists nodes under wrong addresses.

"ndb_mgm -e show" before the rolling update:
Connected to Management Server at: 10.0.0.152:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=2    @10.0.0.121  (Version: 5.0.24, Nodegroup: 0, Master)
id=3    @10.0.0.122  (Version: 5.0.24, Nodegroup: 0)
id=4    @10.0.0.131  (Version: 5.0.24, Nodegroup: 1)
id=5    @10.0.0.132  (Version: 5.0.24, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @10.0.0.152  (Version: 5.0.24)

[mysqld(API)]   6 node(s)
id=31   @10.0.0.151  (Version: 5.0.24)
id=32   @10.0.0.152  (Version: 5.0.24)
id=33   @10.0.0.153  (Version: 5.0.24)
id=41 (not connected, accepting connect from 10.0.0.121)
id=42 (not connected, accepting connect from any host)
id=43 (not connected, accepting connect from any host)

After the update and rolling restart of the ndb_mgmd the nodes with id 3 and 5 wasn't connected and node with id 4 was listed under the ip of node 2.

After the forcefull stop (kill -9 pid) of the ndbd's with the ids 3 and 5 the entire cluster was shutdown.
So i think the storagenodes know that nodes 3 and 5 was still alive and shutdown  the cluster after i killed the ndbd's an the second host...

If you need more data please tell me what you need and i will attach it.

Hi,

1) Can you specify exactly what you did
   (which processes to start/stop which commands you run in which order)

2) Is it reproducable

3) FYI: I made an extra addition to patch, to is useful, but is probably not related to this

/Jonas

1: First we made a configuraton change to add another shm section.
After that i resarted ndb_mgmd with a init script.
At this Point "ndb_mgm -e show" listed the wrong information explained in my last post.

2: I will try to reproduce this over the weekend. I'll report my findings.

3: I saw the second commit related to this bug it in my Mails but havend tested it.

Marc

Oh i forgot to add that after we saw the wrong information in the output of "ndb_mgm -e show" we decided to stop the no more listed nodes, but taht wasn't possible anymore from the management console so we killed the ndbd's with "kill -9 pid"...
At that Point the cluster was forcefully shutdown. I'll append the corresponding clusterlog output after i found it.

Ok,

Thx very much for you assistance.

Looking forward to results after weekend.

/Jonas

Here are the logs of the issue.

- ndb_mgmd was restarted here
2006-09-14 14:17:57 [MgmSrvr] INFO     -- NDB Cluster Management Server. Version 5.0.24
2006-09-14 14:17:57 [MgmSrvr] INFO     -- Id: 1, Command port: 1186
2006-09-14 14:17:58 [MgmSrvr] INFO     -- Node 1: Node 4 Connected
2006-09-14 14:17:58 [MgmSrvr] INFO     -- Node 1: Node 2 Connected
- why doesn't node 1 connect to node 3 and 5 here?
- to avoid a splitbrain issue we decided to kill the nodes 3 and 5 ...
- "kill -9 `pid of node id 3`"
2006-09-14 14:25:37 [MgmSrvr] ALERT    -- Node 2: Node 3 Disconnected
2006-09-14 14:25:37 [MgmSrvr] ALERT    -- Node 4: Node 3 Disconnected
2006-09-14 14:25:37 [MgmSrvr] INFO     -- Node 2: Communication to Node 3 closed
2006-09-14 14:25:37 [MgmSrvr] INFO     -- Node 4: Communication to Node 3 closed
2006-09-14 14:25:37 [MgmSrvr] ALERT    -- Node 3: Forced node shutdown completed. Initiated by signal 9.
- "kill -9 `pid of node id 5`"
2006-09-14 14:25:51 [MgmSrvr] ALERT    -- Node 4: Node 5 Disconnected
2006-09-14 14:25:51 [MgmSrvr] ALERT    -- Node 2: Node 5 Disconnected
2006-09-14 14:25:51 [MgmSrvr] INFO     -- Node 2: Communication to Node 5 closed
2006-09-14 14:25:51 [MgmSrvr] INFO     -- Node 2: Communication to Node 5 closed
2006-09-14 14:25:51 [MgmSrvr] INFO     -- Node 4: Communication to Node 5 closed
2006-09-14 14:25:51 [MgmSrvr] ALERT    -- Node 5: Forced node shutdown completed. Initiated by signal 9.
2006-09-14 14:25:52 [MgmSrvr] INFO     -- Node 1: Node 4 Connected
2006-09-14 14:25:52 [MgmSrvr] ALERT    -- Node 4: Forced node shutdown completed. Initiated by signal 0. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary err
2006-09-14 14:25:53 [MgmSrvr] INFO     -- Node 1: Node 2 Connected
2006-09-14 14:25:53 [MgmSrvr] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 0. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary err

pushed into 5.1.12

pushed into 5.0.29

Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented new option for ndbd in 5.0.29/5.1.12.

*Fix for 5.0 documented in 5.0.30 Release Notes.*

Hi,

it is a really good news that the bind-address parameter is now supported by ndbd.

> Also, I didnt really understand what why you want
>  to use specific address for ndb_mgmd, 
>  please explain.

It is really desirable to be able to restrict ndb_mgmd process to use a specific IP address because of security considerations. There is no built-in security in NDB. The only way to make a MySQL/NDB database secure is to use an isolated LAN for NDB nodes with no access from "outside". On a server with two interfaces, one interface can be accessible from "outside" and one can be not. In this case, it is really desirable to be able to tell ndb_mgmd to use only the internal interface and not to be accessible from the external interface.

It is also not really necessary to support bind-address in ndb_mgmd, because ndb_mgmd process can just read this information from the cluster's config.ini file. The config.ini file can specify what IP address is assigned for the management node. But ndb_mgmd does not use this information to decide what interface to use to bind to. Looks like it just always binds on all interfaces.

Is it possible to re-open this bug report? This bug report actually asks for both ndbd and ndb_mgmd, but the feature was only supported for ndbd. I fully agree with the reporter, Marc - A. Dahlhaus, that the feature should also be supported for ndb_mgmd. Not necessarily by adding the --bind-address parameter, but may be even better by using the "hostname" parameter from config.ini to decide what IP address to use to bind.

/Anatoly Pidruchny

After some more testing and multiple shutdowns/restarts
of every node i spotted a possible problem.

After a restart of ndb_mgmd the adresses listed (and used)
are bogous.

Cluster-Layout:
---------------------
#> ndb_mgm -e show
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=2    @x.x.x.121  (Version: 5.0.24, Nodegroup: 0, Master)
id=3    @x.x.x.122  (Version: 5.0.24, Nodegroup: 0)
id=4    @x.x.x.121  (Version: 5.0.24, Nodegroup: 1)
id=5    @x.x.x.122  (Version: 5.0.24, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @x.x.x.152  (Version: 5.0.24)

[mysqld(API)]   6 node(s)
id=31   @x.x.x.151  (Version: 5.0.24)
id=32   @x.x.x.152  (Version: 5.0.24)
id=33   @x.x.x.153  (Version: 5.0.24)
id=41 (not connected, accepting connect from 10.0.0.121)
id=42   @x.x.x.122  (Version: 5.0.24)
id=43 (not connected, accepting connect from any host)
-------------------

Pysical-Layout:
2 & 4 = ndbd server 1
3 & 5 = ndbd server 2

Problem:
ndbd node 4 was startet with x.x.x.131
ndbd node 5 was startet with x.x.x.132

the positive effect of one daemon on each interface is
gone after an ndb_mgmd restart. the api nodes now open
connections to both ndbd processes on each server over
the ip addresses shown above.

Marc

Should this bug report be closed now since bind-address option is implemented for ndb_mgmd in MySQL Cluster 6.2.x and 6.3.x and also MySQL Cluster 5.1.x is not supported any more (and 5.0.x is old and not recommended for use anyways)?

Mark, I think the problem with wrong IP addresses showed by ndb_mgm should be reported as a new bug report. This bug report is marked as a Feature request and the problem with wrong IP addresses is a bug. I saw this problem in release 5.1.x, but can not say if it is present in 6.2.x and 6.3.x.

/Anatoly.

I closed this one (it was fixed long ago).

I will open a new bug report if i spot the problem with
the wrong adresses again on a restarted cluster 6.3 ndb_mgmd.