Description:
Testing cluster replication. The slave cluster ran out of disk space causing the the cluster to crash.
Stack trace from MySQLD:
0x816cee8 handle_segfault + 392
0x4005e5cd _end + 934600377
0x837b96c
_ZN14NdbTransaction15getNdbOperationEPK12NdbTableImplP12NdbOperation + 12 0x837ba4b _ZN14NdbTransaction15getNdbOperationEPKN13NdbDictionary5TableE
+ 27
0x82157bb _ZN13ha_ndbcluster9write_rowEPc + 211
0x82047c5 _ZN7handler12ha_write_rowEPc + 25 0x81dee4f _ZN14Rows_log_event10exec_eventEP17st_relay_log_info + 627 0x824ac8a _Z20exec_relay_log_eventP3THDP17st_relay_log_info + 578
0x8248e33 handle_slave_sql + 1015
0x400586de _end + 934576074
0x401d86c7 _end + 936148915
From the cluster error log:
2005-05-18 08:48:36 [MgmSrvr] ALERT -- Node 3: Node 2 Disconnected
2005-05-18 08:48:36 [MgmSrvr] ALERT -- Node 5: Node 2 Disconnected
2005-05-18 08:48:36 [MgmSrvr] ALERT -- Node 4: Node 2 Disconnected
2005-05-18 08:48:36 [MgmSrvr] INFO -- Node 3: Communication to Node
2 closed
2005-05-18 08:48:36 [MgmSrvr] INFO -- Node 4: Communication to Node
2 closed
2005-05-18 08:48:36 [MgmSrvr] INFO -- Node 5: Communication to Node
2 closed
2005-05-18 08:48:36 [MgmSrvr] ALERT -- Node 1: Node 2 Disconnected
2005-05-18 08:48:37 [MgmSrvr] ALERT -- Node 5: Node 4 Disconnected
2005-05-18 08:48:37 [MgmSrvr] ALERT -- Node 3: Node 4 Disconnected
2005-05-18 08:48:37 [MgmSrvr] INFO -- Node 3: Possible bug in
Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 2
sig->failNo =
2005-05-18 08:48:37 [MgmSrvr] INFO -- Node 3: Communication to Node
2 closed
2005-05-18 08:48:37 [MgmSrvr] INFO -- Node 3: Communication to Node
4 closed
2005-05-18 08:48:37 [MgmSrvr] INFO -- Node 5: Possible bug in
Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 2
sig->failNo =
2005-05-18 08:48:37 [MgmSrvr] INFO -- Node 5: Communication to Node
2 closed
2005-05-18 08:48:37 [MgmSrvr] INFO -- Node 5: Communication to Node
4 closed
2005-05-18 08:48:37 [MgmSrvr] ALERT -- Node 1: Node 4 Disconnected
2005-05-18 08:48:38 [MgmSrvr] ALERT -- Node 1: Node 3 Disconnected
2005-05-18 08:48:39 [MgmSrvr] ALERT -- Node 1: Node 5 Disconnected
How to repeat:
Setup two clusters with one that replicates to the other. Using the bank test, run the slave out of disk space.
Suggested fix:
Cluster should remain up. Cluster should abort or rollback any non commited transaction and refuse to do any transactions until the disk space issue is corrected.
Description: Testing cluster replication. The slave cluster ran out of disk space causing the the cluster to crash. Stack trace from MySQLD: 0x816cee8 handle_segfault + 392 0x4005e5cd _end + 934600377 0x837b96c _ZN14NdbTransaction15getNdbOperationEPK12NdbTableImplP12NdbOperation + 12 0x837ba4b _ZN14NdbTransaction15getNdbOperationEPKN13NdbDictionary5TableE + 27 0x82157bb _ZN13ha_ndbcluster9write_rowEPc + 211 0x82047c5 _ZN7handler12ha_write_rowEPc + 25 0x81dee4f _ZN14Rows_log_event10exec_eventEP17st_relay_log_info + 627 0x824ac8a _Z20exec_relay_log_eventP3THDP17st_relay_log_info + 578 0x8248e33 handle_slave_sql + 1015 0x400586de _end + 934576074 0x401d86c7 _end + 936148915 From the cluster error log: 2005-05-18 08:48:36 [MgmSrvr] ALERT -- Node 3: Node 2 Disconnected 2005-05-18 08:48:36 [MgmSrvr] ALERT -- Node 5: Node 2 Disconnected 2005-05-18 08:48:36 [MgmSrvr] ALERT -- Node 4: Node 2 Disconnected 2005-05-18 08:48:36 [MgmSrvr] INFO -- Node 3: Communication to Node 2 closed 2005-05-18 08:48:36 [MgmSrvr] INFO -- Node 4: Communication to Node 2 closed 2005-05-18 08:48:36 [MgmSrvr] INFO -- Node 5: Communication to Node 2 closed 2005-05-18 08:48:36 [MgmSrvr] ALERT -- Node 1: Node 2 Disconnected 2005-05-18 08:48:37 [MgmSrvr] ALERT -- Node 5: Node 4 Disconnected 2005-05-18 08:48:37 [MgmSrvr] ALERT -- Node 3: Node 4 Disconnected 2005-05-18 08:48:37 [MgmSrvr] INFO -- Node 3: Possible bug in Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 2 sig->failNo = 2005-05-18 08:48:37 [MgmSrvr] INFO -- Node 3: Communication to Node 2 closed 2005-05-18 08:48:37 [MgmSrvr] INFO -- Node 3: Communication to Node 4 closed 2005-05-18 08:48:37 [MgmSrvr] INFO -- Node 5: Possible bug in Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 2 sig->failNo = 2005-05-18 08:48:37 [MgmSrvr] INFO -- Node 5: Communication to Node 2 closed 2005-05-18 08:48:37 [MgmSrvr] INFO -- Node 5: Communication to Node 4 closed 2005-05-18 08:48:37 [MgmSrvr] ALERT -- Node 1: Node 4 Disconnected 2005-05-18 08:48:38 [MgmSrvr] ALERT -- Node 1: Node 3 Disconnected 2005-05-18 08:48:39 [MgmSrvr] ALERT -- Node 1: Node 5 Disconnected How to repeat: Setup two clusters with one that replicates to the other. Using the bank test, run the slave out of disk space. Suggested fix: Cluster should remain up. Cluster should abort or rollback any non commited transaction and refuse to do any transactions until the disk space issue is corrected.