Bug #61634 ndb_mgmd fails to start if net.ipv4.ip_nonlocal_bind is enabled
Submitted: 24 Jun 2011 18:53 Modified: 30 Jul 2011 15:36
Reporter: Evan Kinney Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.2.0 devmilestone OS:Linux (RHEL Server 6.1 (2.6.32-131.2.1.el6.x86_64))
Assigned to: Assigned Account CPU Architecture:Any

[24 Jun 2011 18:53] Evan Kinney
Description:
When net.ipv4.ip_nonlocal_bind is enabled in sysctl.conf, ndb_mgmd complains that it can't determine which nodeid to use and subsequently fails to start.

Looks like this is due to the alone_on_host function in src/mgmsrv/ConfigManager.cpp which loops through the global config file and tries to do a SocketServer::tryBind on each ndb_mgmd address it finds that doesn't match what it thinks should be its nodeid. If it binds successfully (which, given the above conditions, it will), it returns false which causes a fatal error.

How to repeat:
1. Set net.ipv4.ip_nonlocal_bind to 1 in whatever method you choose
2. Try to start/restart ndb_mgmd

Suggested fix:
We are currently working around the issue by specifying the nodeid in the command line we're using in our init script, but this is obviously less than optimal.

Maybe instead of trying to bind to each address, just compare the IP to the list of active interfaces on the system.
[30 Jun 2011 15:36] MySQL Verification Team
I have verified the behavior as described. However it only occurs when there are multiple MGM nodes listed in the config.ini.  When only one ndb_mgmd exists it is able to identify the proper NodeId to use and binds successfully. 

The point of net.ipv4.ip_nonlocal_bind = 1 is that there is no interface locally to compare against to determine which node should be started.  What information are you suggest we compare against?
[30 Jul 2011 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".