Bug #20251 No data node will start, UNLESS it's on the same physical server as Mgmt Node
Submitted: 3 Jun 2006 21:32 Modified: 6 Jul 2006 9:41
Reporter: Darren Collins Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.1.9 OS:Linux (RedHat 4)
Assigned to: CPU Architecture:Any

[3 Jun 2006 21:32] Darren Collins
Description:
I cannot get ndbd --initial to run. I am trying to set up a 3 node cluster (2 data 1 mgmt). When I run ndbd from either data node, I get the following: 

[root@lghdevsr3 mysql-cluster]# ndbd --initial 
Unable to connect with connect string: nodeid=0,10.0.0.7:1186 
Retrying every 5 seconds. Attempts left: 12 

If you let ndbd retry until it times out, you get the error message and log file, shown at the very end. 

ANY suggestions on where to go from here, or what might/is wrong?? 

Thanks 

Configuration information: 

******************************************** 
config.ini 
******************************************** 
[ndbd default] 
NoOfReplicas=2 
DataMemory=80M 
IndexMemory=18M 
DataDir=/var/lib/mysql-cluster 

[ndb_mgmd default] 
DataDir=/var/lib/mysql-cluster 

[ndb_mgmd] 
hostname=10.0.0.7 
datadir=/var/lib/mysql-cluster 
ArbitrationRank=1 

[ndbd] 
hostname=10.0.0.8 
datadir=/sqldata 

[ndbd] 
hostname=10.0.0.9 
datadir=/sqldata 

[mysqld] 
hostname=10.0.0.8 

[mysqld] 
hostname=10.0.0.8 
******************************************** 
END 
******************************************** 

******************************************** 
ndb_mgm show 
******************************************** 
Connected to Management Server at: 10.0.0.7:1186 
Cluster Configuration 
--------------------- 
[ndbd(NDB)] 2 node(s) 
id=2 (not connected, accepting connect from 10.0.0.8) 
id=3 (not connected, accepting connect from 10.0.0.9) 

[ndb_mgmd(MGM)] 1 node(s) 
id=1 @10.0.0.7 (Version: 5.1.9) 

[mysqld(API)] 2 node(s) 
id=4 (not connected, accepting connect from 10.0.0.8) 
id=5 (not connected, accepting connect from 10.0.0.8) 
******************************************** 
END 
******************************************** 

******************************************** 
my.cnf 
******************************************** 
[mysql.server] 
user=mysql 

[client] 
port=3306 
socket=/var/lib/mysql/mysql.sock 
default-character-set=latin1 

[mysqld] 
ndbcluster # run NDB engine 
ndb-connectstring=10.0.0.7 # location of MGM node 
port=3306 
socket=/var/lib/mysql/mysql.sock 
basedir=/usr 
datadir=/sqldata 
default-character-set=latin1 
default-storage-engine=NDBCLUSTER 
max_connections=341 
query_cache_size=16M 
thread_concurrency = 8 

table_cache=700 
tmp_table_size=16M 
thread_cache_size=8 
log_bin=mysql-bin 
server-id = 1 
log_warnings 
log_slow_queries 
long_query_time = 2 
log_long_format 
transaction_isolation = READ-COMMITTED 

key_buffer_size=100M 
read_buffer_size=1M 
read_rnd_buffer_size=4M 
sort_buffer_size=1M 

[mysqldump] 
quick 
max_allowed_packet = 16M 

[mysql] 
no-auto-rehash 

[mysqlhotcopy] 
interactive-timeout 

[mysqld_safe] 
open-files-limit = 8192 
err-log=/var/log/mysqld.log 
pid-file=/var/run/mysqld/mysqld.pid 

[mysql_cluster] 
ndb-connectstring=10.0.0.7 # location of MGM node 

[ndbd] 
connect-string=10.0.0.7 

[ndb_mgm] 
connect-string=10.0.0.7 

[ndb_mgmd] 
config-file=/var/lib/mysql-cluster/config.ini 
******************************************** 
END 
******************************************** 

******************************************** 
netstat -tulnp 
******************************************** 
Active Internet connections (only servers) 
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name 
tcp 0 0 0.0.0.0:32769 0.0.0.0:* LISTEN 2325/rpc.statd 
tcp 0 0 0.0.0.0:1186 0.0.0.0:* LISTEN 10245/ndb_mgmd 
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 2305/portmap 
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 2457/cupsd 
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 4940/sendmail: acce 
tcp 0 0 :::80 :::* LISTEN 4845/httpd 
tcp 0 0 :::22 :::* LISTEN 2493/sshd 
tcp 0 0 :::443 :::* LISTEN 4845/httpd 
udp 0 0 0.0.0.0:32768 0.0.0.0:* 2325/rpc.statd 
udp 0 0 0.0.0.0:805 0.0.0.0:* 2325/rpc.statd 
udp 0 0 0.0.0.0:111 0.0.0.0:* 2305/portmap 
udp 0 0 0.0.0.0:631 0.0.0.0:* 2457/cupsd 
******************************************** 
END 
******************************************** 

******************************************** 
ndbd error message 
******************************************** 
[root@lghdevsr3 mysql-cluster]# ndbd --initial -c 10.0.0.7:1186 
Unable to connect with connect string: nodeid=0,10.0.0.7:1186 
Retrying every 5 seconds. Attempts left: 12 11 10 9 8 7 6 5 4 3 2 1, failed. 
error=2350 
2006-06-02 14:26:51 [ndbd] INFO -- Error handler restarting system 
2006-06-02 14:26:51 [ndbd] INFO -- Error handler shutdown completed - exiting 
sphase=0 
exit=-1 
******************************************** 
END 
******************************************** 

******************************************** 
log file content 
******************************************** 
Current byte-offset of file-pointer is: 568 
Time: Friday 2 June 2006 - 14:26:51 
Status: Permanent error, external action needed 
Message: Invalid configuration received from Management Server (Configuration error) 
Error: 2350 
Error data: Could not connect to ndb_mgmd 
Error object: 
Program: ndbd 
Pid: 6376 
Trace: <no tracefile> 
Version: Version 5.1.9 (beta) 
***EOM*** 
******************************************** 
END 
********************************************

How to repeat:
Just try and start one of my data nodes

Suggested fix:
Don't know, that's why I am giving you this information.
[5 Jun 2006 13:04] Valeriy Kravchuk
Changed category to a more appropriate one.
[6 Jun 2006 0:43] Hartmut Holzgraefe
We're sorry, but the bug system is not the appropriate forum for 
asking help on using MySQL products. Your problem is not the result 
of a bug.

Support on using our products is available both free in our forums
at http://forums.mysql.com and for a reasonable fee direct from our
skilled support engineers at http://www.mysql.com/support/

Thank you for your interest in MySQL.

Additional info:

You need to give the ndbd processes a valid connect string so that they know where the management server is, the default connect string is "localhost" so that's why it works for you with all nodes on the same box. See e.g. http://dev.mysql.com/doc/refman/5.0/en/multi-config.html
[6 Jun 2006 2:52] Darren Collins
How about giving a decent error message???

I'm not asking for support, when something doesn't work and you get and error message so poor it would makes MS look good, then you have a bug.
[6 Jun 2006 9:41] Hartmut Holzgraefe
Looks like i misread the IP in the error message as 127.0.0.1.

Your configuration is indeed correct and the ndbd processes should be able to connect to the ndb_mgmd.

Things that can go wrong at this stage:

- no network connectivity between the machines (probably not likely)
- firewall rules blocking access to the management server
- ndbd connecting using the wrong ip on machines with multiple network interfaces
- ndb_mgmd not accepting connections for some other reason although it is listening on its port
  (as visible from the netstat results)

Are there any error messages in the ndb_1_cluster.log logfile on the management server?

If not: can you try to log the actual network traffic from ndbd to ndb_mgmd
on the ndbd machine, using e.g. tcpdump or ethereal?
[6 Jul 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".