Bug #55473 | Cluster Crash with "Forced node shutdown completed. Caused by error 2303" | ||
---|---|---|---|
Submitted: | 22 Jul 2010 8:04 | Modified: | 3 May 2012 6:32 |
Reporter: | Christian Ehmig | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | 5.1.44-ndb-7.1.4b | OS: | Linux (Ubuntu Kernel 2.6.32) |
Assigned to: | CPU Architecture: | Any | |
Tags: | crash, error 2302, ndbdmtd |
[22 Jul 2010 8:04]
Christian Ehmig
[22 Jul 2010 8:04]
Christian Ehmig
set operating system
[22 Jul 2010 8:15]
Christian Ehmig
ndb_mgm and api node config files
Attachment: bug-data-55473-config.tgz (application/x-compressed, text), 3.94 KiB.
[22 Jul 2010 8:15]
Christian Ehmig
ndb log files (data node 1)
Attachment: bug-data-55473-ndb-logs.tgz (application/x-compressed, text), 281.38 KiB.
[22 Jul 2010 8:58]
Christian Ehmig
excerpt from the ndb_out log (complete log attached): sendbufferpool waiting for lock, contentions: 665000 spins: 26060765 sendbufferpool waiting for lock, contentions: 670000 spins: 26280750 sendbufferpool waiting for lock, contentions: 675000 spins: 26498671 sendbufferpool waiting for lock, contentions: 680000 spins: 26685353 sendbufferpool waiting for lock, contentions: 685000 spins: 26828572 send lock node 3 waiting for lock, contentions: 115000 spins: 320219680 sendbufferpool waiting for lock, contentions: 690000 spins: 26984735 sendbufferpool waiting for lock, contentions: 695000 spins: 27174093 sendbufferpool waiting for lock, contentions: 700000 spins: 27307804 sendbufferpool waiting for lock, contentions: 705000 spins: 27514391 c_nodeStartMaster.blockGcp: 0 4294967040 m_gcp_save.m_counter: 1210 m_gcp_save.m_max_lag: 1210 m_micro_gcp.m_counter: 1 m_micro_gcp.m_max_lag: 101 m_gcp_save.m_state: 1 m_gcp_save.m_master.m_state: 1 m_micro_gcp.m_state: 0 m_micro_gcp.m_master.m_state: 0 c_COPY_GCIREQ_Counter = [SignalCounter: m_count=0 0000000000000000] c_COPY_TABREQ_Counter = [SignalCounter: m_count=0 0000000000000000] c_CREATE_FRAGREQ_Counter = [SignalCounter: m_count=0 0000000000000000] c_DIH_SWITCH_REPLICA_REQ_Counter = [SignalCounter: m_count=0 0000000000000000] c_EMPTY_LCP_REQ_Counter = [SignalCounter: m_count=0 0000000000000000] c_GCP_COMMIT_Counter = [SignalCounter: m_count=0 0000000000000000] c_GCP_PREPARE_Counter = [SignalCounter: m_count=0 0000000000000000] c_GCP_SAVEREQ_Counter = [SignalCounter: m_count=2 000000000000000c] c_INCL_NODEREQ_Counter = [SignalCounter: m_count=0 0000000000000000] c_MASTER_GCPREQ_Counter = [SignalCounter: m_count=0 0000000000000000] c_MASTER_LCPREQ_Counter = [SignalCounter: m_count=0 0000000000000000] c_START_INFOREQ_Counter = [SignalCounter: m_count=0 0000000000000000] c_START_RECREQ_Counter = [SignalCounter: m_count=0 0000000000000000] c_STOP_ME_REQ_Counter = [SignalCounter: m_count=0 0000000000000000] c_TC_CLOPSIZEREQ_Counter = [SignalCounter: m_count=0 0000000000000000] c_TCGETOPSIZEREQ_Counter = [SignalCounter: m_count=0 0000000000000000] m_copyReason: 0 m_waiting: 0 0 c_copyGCISlave: sender{Data, Ref} 2 f60002 reason: 0 nextWord: 0 Detected GCP stop(1)...sending kill to [SignalCounter: m_count=2 000000000000000c] LP 0 blockInstance: 2 partNo: 1 state: 0 WW_Gci: 1 gcprec: -256 flq: -256 currfile: 31 tailFileNo: 21 logTailMbyte: 78 LP 0 blockInstance: 1 partNo: 0 state: 0 WW_Gci: 1 gcprec: -256 flq: -256 currfile: 32 tailFileNo: 21 logTailMbyte: 118 file 0(0) FileChangeState: 0 logFileStatus: 20 currentMbyte: 255 currentFilepage 8191
[10 Aug 2010 13:28]
Hartmut Holzgraefe
From looking at the ndb out log it seems as if you've committed a massive transaction. This can cause problem as committing takes time, and in case of large transactions touching many rows it can take more time than the cluster is willing to wait before starting the next global checkpoint (to not loose too much durability) ... Can you confirm that your transactions are rather large (e.g. > 10.000 rows)?
[24 Aug 2010 7:58]
Christian Ehmig
I can confirm that we have rather large transactions, but only around 500 rows per transaction max (bulk inserts). Besides that, we use many threads for writing (around 150 avg). This bug only occurs with the cluster binaries - I have recently built mysql-cluster for our target and everything seems to work fine so far, so the issue might be related to the binaries.
[3 May 2012 6:32]
Gustaf Thorslund
Duplicate of bug #37227 and bug #58383