Bug #43969 multi-thread data node (ndbmtd) fails to connect on FreeBSD7
Submitted: 30 Mar 2009 20:00 Modified: 26 Jan 2011 15:48
Reporter: Anthony Rossano Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:FreeBSD (7.1)
Assigned to: CPU Architecture:Any
Tags: cluster, ndb, ndbmtd ndbd

[30 Mar 2009 20:00] Anthony Rossano
Description:
Hi All - 
On freeBSD 7.1, with a custom compiled mysql (based on the soure and makefiles from 'sandbox') the data nodes fail to connect when they are on separate computers, using the multi-threaded data node. 
They do connect OK without the multithreaded data node. 
There is no firewall on the hosts. 
Communication seems Ok during startup till stage 5. 

The error log on the data nodes report this, then all nodes shut down: 

From primary node. 
Time: Monday 30 March 2009 - 18:53:53
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: SimulatedBlock.cpp
Error object: DBTC (Line: 501) 0x0000000a
Program: /usr/mogreet_distro/mysql-5.1.32-ndb-6.4.3/libexec/ndbmtd
Pid: 72519
Trace: /data/cluster6.4.3/ndb/ndb_21_trace.log.2 /data/cluster6.4.3/ndb/ndb_21_trace.log.2_t1 /data/cluster6.4.3/ndb/ndb_21_t

From the second node:
2009-03-30 12:44:07 [ndbd] INFO     -- Node 21 disconnected
2009-03-30 12:44:07 [ndbd] INFO     -- QMGR (Line: 2908) 0x0000000e
2009-03-30 12:44:07 [ndbd] INFO     -- Error handler startup shutting down system
2009-03-30 12:44:07 [ndbd] INFO     -- Error handler shutdown completed - exiting
2009-03-30 12:44:07 [ndbd] INFO     -- Angel received ndbd startup failure count 1.
2009-03-30 12:44:07 [ndbd] ALERT    -- Node 22: Forced node shutdown completed. Occured during startphase 5. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.

here is the (test) config file;
#Note: Production needs two mgmd servers
# but while we get it figured out, only one. 
[NDB_MGMD]
id=13
#chimay.mogreet.com
hostname=192.168.100.125
datadir=/data/cluster6.4.3/ndb_mgmd
ArbitrationRank=1

[NDBD DEFAULT]
NoOfReplicas=2
datadir=/data/cluster6.4.3/ndb
DataMemory = 10G  #Default for production
IndexMemory = 200M
NoOfFragmentLogFiles = 24
FragmentLogFileSize = 16M
BackupDataDir=/mnt/nas/databases/backup
StartFailureTimeout=1000000
HeartbeatIntervalDbDb=1500 #ATR tuning

# Note: Production needs four ndb nodes.
[NDBD]
id = 21
hostname=192.168.100.125

[NDBD]
id = 22
#duvel
hostname=192.168.100.126

[MYSQLD]
hostname=192.168.100.125

How to repeat:
1) start mgmd server on node Chimay
/usr/mogreet_distro/mysql-5.1.32-ndb-6.4.3/libexec/ndb_mgmd --defaults-file=/data/configs/my_cluster-6.4.3.cnf --initial --ndb-nodeid=13

2) start mgm on Chimay
bin/ndb_mgm --defaults-file=/data/configs/my_cluster-6.4.3.cnf

3) start the remote node... (on host duvel)
/usr/mogreet_distro/mysql-5.1.32-ndb-6.4.3/libexec/ndbd --defaults-file=/data/configs/my_cluster-6.4.3.cnf  --ndb-nodeid=22 --initial

4) start Chimay's data node
/usr/mogreet_distro/mysql-5.1.32-ndb-6.4.3/libexec/ndbd --defaults-file=/data/configs/my_cluster-6.4.3.cnf --ndb-nodeid=21 --initial

5) do a 'show' in mgm, it shows nodes starting up. 

6) wait 4-6 seconds.  errors appear in mgm, and in node logs, both ndb data nodes quit. 

Suggested fix:
let me know if you need the  trace logs. 
It's not a huge deal as long as I can get the single-threaded version running. 
Let me know if you would like me to experiment further with the heartbeat timeouts and/or reduce the memory allocated (it's huge, at 10 GB). 
Thanks!
ATR
[30 Mar 2009 20:07] Jonas Oreland
Hi,

I'm pretty sure that this is fixed in a later version
i.e 7.0.4 (or bzr)

Can you test with a later version

(or else you can add the SendBufferMemory: 2M to your config.ini)

/Jonas
[30 Apr 2009 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[11 May 2009 13:37] Jonathan Miller
mysql-5.1.32-ndb-6.4.3

Yet to be supported FreeBSD 7.1??? 
http://www.mysql.com/support/supportedplatforms/cluster.html
[27 Jan 2011 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".