Bug #39738 Unable to start-up large cluster
Submitted: 29 Sep 2008 22:22 Modified: 27 Nov 2008 12:19
Reporter: Brian Morin Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:ndb-6.3.15 cge gpl OS:Linux (CentOS 5.2)
Assigned to: Assigned Account CPU Architecture:Any

[29 Sep 2008 22:22] Brian Morin
Description:
I have a 12 physical machine cluster that I'm trying to maximize performance out of.  Each machine has 8 processors and 16gb of memory.  Running 2 instances of ndbd on each physical machine worked fine.  However when trying 4 instances of ndbd on each machine the cluster will not start.

Node 6 segfaults:
Time: Monday 29 September 2008 - 22:06:30
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 11 received; Segmentation fault
Error object: main.cpp
Program: /usr/sbin/ndbd
Pid: 6187
Trace: /ndb/mysql-cluster/ndb_6_trace.log.8
Version: mysql-5.1.24 ndb-6.3.15-RC
***EOM***

Nodes 20, 33 and 46 show this error:
Time: Monday 29 September 2008 - 22:09:47
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: SimulatedBlock.cpp
Error object: DBTC (Line: 296) 0x0000000e
Program: /usr/sbin/ndbd
Pid: 29025
Trace: /ndb/mysql-cluster/ndb_33_trace.log.6
Version: mysql-5.1.24 ndb-6.3.15-RC
***EOM***

Will attach trace files.

How to repeat:
Start 48 node cluster with 4 instances of ndbd per physical machine.
[27 Oct 2008 12:20] Frazer Clement
Thanks for your bug report

The logs you sent consistently show an issue of some kind with transaction timeout checking during startup.

Could you please also send the cluster config file used for this cluster, as well as any further files which may help (e.g. cluster log file from MGMD node, .out files for affected nodes (6, 20, 33, 46).

If the issue is still reproducible, a core file would also be helpful.  NDBDs can be configured to produce core files in error scenarios by passing the --core-file option on the command line.

Thanks.
[28 Nov 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".