Bug #18841 Multiple MySQL Cluster NDB engines die in Disk based cluster
Submitted: 6 Apr 2006 9:55 Modified: 29 Jun 2006 18:48
Reporter: Kris Buytaert (Candidate Quality Contributor) Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.1.6 alpha OS:Linux (Linux)
Assigned to: CPU Architecture:Any

[6 Apr 2006 9:55] Kris Buytaert
Description:
I've setup a disk based ndb setup as documented in on http://mikaelronstrom.blogspot.com/2006/02/how-to-define-table-that-uses-disk.html

Each couple of days both ndb nodes in my cluster die with 

ACCESS1-DB-B:/var/lib/mysql/mysql-cluster # more ndb_3_error.log
Current byte-offset of file-pointer is: 568

Time: Wednesday 5 April 2006 - 20:13:46
Status: Temporary error, restart node
Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 3826) 0x0000000e
Program: ndbd
Pid: 2856
Trace: /var/lib/mysql/mysql-cluster//ndb_3_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***

In this case that's about 2 hours after I last touched the cluster.
Both nodes die within 2-3 minutes time.

I've already repeated this setup twice and the only thing that differs is the time it takes for the cluster to crash.

I`m attaching the trace file of one of the crashing ndb's.

How to repeat:
ndbd --initial
create tablespaces and tables as described in Mikaels blog entry
create some entries
wait
wait
crash
[6 Apr 2006 9:56] Kris Buytaert
Trace of an ndbd crash

Attachment: ndb_3_trace.log.1.gz (application/x-gzip, text), 41.11 KiB.

[6 Apr 2006 10:03] Jonas Oreland
Hi,

1) Can you upload *trace*error* files, cluster log & config.ini?
2) Many bugs has been fixed since 5.1.6 
3) Reading the tracefile that you uploaded I can find nothing actually related to disk based.
    (but it's hard to tell with only one of the tracefiles)

Thx
/jonas
[6 Apr 2006 10:50] Kris Buytaert
2nd log file

Attachment: ndb_2_trace.log.1.gz (application/x-gzip, text), 42.75 KiB.

[6 Apr 2006 10:55] Kris Buytaert
config.ini

Attachment: config.ini (application/octet-stream, text), 1.07 KiB.

[6 Apr 2006 10:56] Kris Buytaert
Cluster log

Attachment: ndb_1_cluster.log.gz (application/x-gzip, text), 42.60 KiB.

[6 Apr 2006 10:58] Kris Buytaert
2nd trace file uploaded .. 
The full errorlog on 1 nod has already been pasted in the original submission , the 2nd one is identical 
ACCESS1-DB-A:/var/lib/mysql/mysql-cluster # more ndb_2_error.log
Current byte-offset of file-pointer is: 568

Time: Wednesday 5 April 2006 - 20:12:36
Status: Temporary error, restart node
Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 3826) 0x0000000e
Program: ndbd
Pid: 2436
Trace: /var/lib/mysql/mysql-cluster//ndb_2_trace.log.1
Version: Version 5.1.6 (alpha)
***EOM***

Other files have been uploaded.

I`ll try upgrading to a more recent version.  The reason why I suspect it is diskbased related is that this is the only change I made before the crashes started hapenning is the tablespace creation etc.
[23 Apr 2006 6:52] Jonas Oreland
Hi,

I can not find any explanation other than network failure of some kind.
Both node conclude that the other has died, as it seems wo/ any obvoius reason.

Do you still get it?
Have you tried never version?
Can you try to capture a sql sequence that triggers the problem?

/Jonas
[23 Apr 2006 10:37] Kris Buytaert
Upgrades are on my todolist .. but I didn't have time yet...

The cluster is completely idle, so there won't be any sql statements running that can be reproduced.

I`ll get back as soon as I have my test platforms upgraded to a more recent distribution.
[12 May 2006 9:06] Valeriy Kravchuk
Please, try to use newer version, 5.1.9 (or 5.1.10, to be released soon), in your further tests.
[29 May 2006 8:47] Kris Buytaert
I've updated to 5.1.9 in the meanwhile..  my cluster has been up for 5 days now it seems like the problem has been fixed in the more recent versions.

I`ll continue runnnig tests to se if it stays stable :)
[29 May 2006 18:48] Valeriy Kravchuk
Please, reopen this bug report in case of similar problem with 5.1.9.
[29 Jun 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".