Bug #45612 Data node crash
Submitted: 19 Jun 2009 11:55 Modified: 24 Aug 2009 8:22
Reporter: Maciej Nadolski Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telco-7.0 OS:Linux (CentOS 5.2)
Assigned to: Assigned Account CPU Architecture:Any
Tags: 7.0.6, multi threaded, ndb-mt

[19 Jun 2009 11:55] Maciej Nadolski
Description:
Data node fails.

Pid: 11958
Trace: /data/mysqlcluster//ndb_3_trace.log.12 /data/mysqlcluster//ndb_3_trace.log.12_t1 /data/mysqlcluster//ndb_3_trace.log.12_t2 /data/mysq
lcluster//ndb_3_trace.log.12_t3 /data/mysqlcluster//ndb_3_t
Time: Thursday 18 June 2009 - 13:30:41
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: suma/Suma.cpp
Error object: SUMA (Line: 4120) 0x0000000a
Program: ./ndbmtd
Pid: 1856
Trace: /data/mysqlcluster//ndb_3_trace.log.13 /data/mysqlcluster//ndb_3_trace.log.13_t1 /data/mysqlcluster//ndb_3_trace.log.13_t2 /data/mysq
lcluster//ndb_3_trace.log.13_t3 /data/

How to repeat:
Just replay certain queries.
I don't know which ones. I'll provide trace files.

Suggested fix:
Don't know.
[22 Jun 2009 13:36] Jørgen Austvik
Please also provide configuration files
[22 Jun 2009 20:54] Maciej Nadolski
I provided configuration files for cluster and api nodes
[29 Jun 2009 11:50] Maciej Nadolski
Do you have everything to analyse and debug?
[29 Jun 2009 20:36] Jonas Oreland
Maciej

What kind of HW do you run your data nodes on?
It looks like an unhandled unbalance between the threads...

/Jonas
[30 Jun 2009 8:15] Maciej Nadolski
Thanks for your replay Jonas,

I am using 2x quad-core xeons, 32GB ram on both data node servers.
I think using eight threads in ndbmtd is reasonable.

What I figured out, when I increased MaxNoOfConcurrentOperations to 300000 ndbmtd crashed even easier - I even haven't started to run queries.

What do you think about it? Do you think this is configuration error?
[30 Jun 2009 8:26] Jonas Oreland
Hi,

Do you mean that each data-node has 4-cores available,
or that each data-node has 8-cores to use?

If it's 4-cores...I would suggesting trying using MaxNoOfExecutionThreads=4
If it's 8-cores...then we need to look more

Also, can you upload the out-files from the data-nodes

/Jonas
[11 Aug 2009 8:20] Jonas Oreland
see bug#46123
[17 Aug 2009 7:57] Jonas Oreland
see bug#46723
[17 Aug 2009 8:05] Maciej Nadolski
> Do you mean that each data-node has 4-cores available,
> or that each data-node has 8-cores to use?

Hi,

Each box has 2 CPUs. Each CPU is quad-core.
There is single ndbmtd process on each box.
So, ndbmtd has access to 8 cores.
[17 Aug 2009 8:42] Robert Klikics
Same problem here with  Intel(R) Xeon(R) CPU X5460  @ 3.16GHz (Quadcore).
[19 Aug 2009 15:29] Jonas Oreland
Hi,

If you could retest with patch I attached to
http://bugs.mysql.com/bug.php?id=46782 that would be great

/Jonas