Bug #22375 Two NDB Cluster nodes crashed after executing DELETE statement
Submitted: 15 Sep 2006 1:53 Modified: 15 Sep 2006 6:48
Reporter: Hamid Badiozamani Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.0.24 OS:Linux (Linux)
Assigned to: Assigned Account CPU Architecture:Any

[15 Sep 2006 1:53] Hamid Badiozamani
Description:
The cluster is 6 identical servers with 3 node groups. The following is the node data/index usage:

2006-09-14 18:31:27 [MgmSrvr] INFO     -- Node 11: Data usage is 92%(60738 32K pages of total 65536)
2006-09-14 18:31:27 [MgmSrvr] INFO     -- Node 11: Index usage is 68%(45051 8K pages of total 65568)
2006-09-14 18:31:27 [MgmSrvr] INFO     -- Node 12: Data usage is 92%(60738 32K pages of total 65536)
2006-09-14 18:31:27 [MgmSrvr] INFO     -- Node 12: Index usage is 68%(45051 8K pages of total 65568)
2006-09-14 18:31:28 [MgmSrvr] INFO     -- Node 13: Data usage is 62%(40928 32K pages of total 65536)
2006-09-14 18:31:28 [MgmSrvr] INFO     -- Node 13: Index usage is 46%(30196 8K pages of total 65568)
2006-09-14 18:31:28 [MgmSrvr] INFO     -- Node 14: Data usage is 62%(40928 32K pages of total 65536)
2006-09-14 18:31:28 [MgmSrvr] INFO     -- Node 14: Index usage is 46%(30196 8K pages of total 65568)
2006-09-14 18:31:28 [MgmSrvr] INFO     -- Node 15: Data usage is 92%(60710 32K pages of total 65536)
2006-09-14 18:31:28 [MgmSrvr] INFO     -- Node 15: Index usage is 68%(45080 8K pages of total 65568)
2006-09-14 18:31:28 [MgmSrvr] INFO     -- Node 16: Data usage is 92%(60710 32K pages of total 65536)
2006-09-14 18:31:28 [MgmSrvr] INFO     -- Node 16: Index usage is 68%(45080 8K pages of total 65568)

Upon executing the command "DELETE FROM emailblastdata WHERE timestamp <= 1143878399 two of the nodes crashed: Node 11 and Node 16.

Node 11's error log shows:
Time: Thursday 14 September 2006 - 18:31:51
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 11 received; Segmentation fault
Error object: main.cpp
Program: ndbd
Pid: 19821
Trace: /opt/mysql/ndb/ndb_11_trace.log.1
Version: Version 5.0.24
***EOM***

Node 16's error log shows:
Time: Thursday 14 September 2006 - 18:31:52
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 11 received; Segmentation fault
Error object: main.cpp
Program: ndbd
Pid: 11264
Trace: /opt/mysql/ndb/ndb_16_trace.log.1
Version: Version 5.0.24
***EOM***

How to repeat:
I'm not brave enough to try to reproduce this bug on our live system.

The problem seems to happen that when dealing with a high volume of transactions, the system becomes unstable and nodes begin to drop off, sometimes bringing the whole cluster down.
[15 Sep 2006 4:31] Jonas Oreland
This looks _very_ much like 
http://bugs.mysql.com/bug.php?id=21384
which was fixed in 5.0.25

But this can only be verified if you also upload
  /opt/mysql/ndb/ndb_11_trace.log.1 and
  /opt/mysql/ndb/ndb_16_trace.log.1

/Jonas
[15 Sep 2006 6:41] Hamid Badiozamani
Trace file for Node 16

Attachment: ndb_11_trace.log.bug-22375.zip (application/x-zip-compressed, text), 51.49 KiB.

[15 Sep 2006 6:42] Hamid Badiozamani
Node 11 Trace Log

Attachment: ndb_16_trace.log.bug-22375.zip (application/x-zip-compressed, text), 86.05 KiB.

[15 Sep 2006 6:43] Hamid Badiozamani
Thanks for your quick response Jonas. Please see the attached trace logs.
[15 Sep 2006 6:48] Jonas Oreland
Hi,

Yes this is a duplicate.
Fix in 5.0.25 will instead abort transaction (w/ error code)

So what you really need to do is to increase
"TransactionBufferMemory", as crash is bug in error handling
 of out of this resource

/Jonas