| Bug #37227 | Node 3 killed this node because GCP stop was detected | ||
|---|---|---|---|
| Submitted: | 5 Jun 2008 17:08 | Modified: | 9 Feb 2009 13:34 |
| Reporter: | Mark Malakanov | ||
| Status: | Verified | ||
| Category: | Server: Cluster | Severity: | S2 (Serious) |
| Version: | mysql-5.1-telco-6.2 | OS: | Linux (F8, 2.6.24.7-92.fc8 SMP x86_64) |
| Assigned to: | Target Version: | ||
| Tags: | GCP, mysql-5.1.23 ndb-6.2.15 -> ndb-6.4.0 | ||
| Triage: | Triaged: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment) | ||
[5 Jun 2008 17:08]
Mark Malakanov
[5 Jun 2008 17:15]
Mark Malakanov
ndb config
Attachment: my-ndb.cfg (text/plain), 4.34 KiB.
[5 Jun 2008 17:16]
Mark Malakanov
logs
Attachment: nb_3_logs.zip (application/x-zip-compressed, text), 357.08 KiB.
[25 Jun 2008 10:31]
Feelora Gee
I meet the same question. hey, how to fixed it?
[30 Jun 2008 12:07]
Jonas Oreland
Hi, We have found this at a customer site aswell, and suspect that it's ntp causing problem. Therefor we changed to using CLOCK_MONOTONIC, this patch is in 6.3.16 but we have made no new 6.2.x-release since. Could any/both of you investigate if this could be a cause? /Jonas
[31 Jul 2008 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[14 Nov 2008 21:21]
Bogdan Kecman
same problem on customer site: server1: datanode1 + datanode3 server2: datanode2 + datanode4 server3: mgm node + api node server4: mgm node + api node server5-x: api node restart server2 and datanode1 dies with: Error: 2303 Error data: Node 1 killed this node because GCP stop was detected Error object: NDBCNTR (Line: 235) 0x00000002 If you kill -9 all ndbd processes on server2 problem does not occur but shutdown -rf now get's it every time. ALL servers have time synchronised (ntpd)
[16 Nov 2008 13:57]
Geert Vanderkelen
Different error, but similar effect..
Attachment: ndb_error_report_20081116135307.tar.bz2 (application/octet-stream, text), 49.33 KiB.
[16 Nov 2008 14:02]
Geert Vanderkelen
Similar effect, but different error, using MySQL Cluster 6.3.17: Doing a `shutdown -rf now` on the machine running Data Node 3 and 5, they are losing heartbeats: 2008-11-16 13:27:11 [MgmSrvr] WARNING -- Node 3: Node 1 missed heartbeat 2 2008-11-16 13:27:11 [MgmSrvr] WARNING -- Node 5: Node 1 missed heartbeat 2 2008-11-16 13:27:12 [MgmSrvr] WARNING -- Node 4: Node 3 missed heartbeat 2 2008-11-16 13:27:13 [MgmSrvr] WARNING -- Node 4: Node 3 missed heartbeat 3 Shortly after: 2008-11-16 13:27:22 [MgmSrvr] ALERT -- Node 2: Forced node shutdown completed. Caused by error 2315: 'Node declared dead. See error log for details(Arbitration error). Temporary error, restart node'. 2008-11-16 13:27:48 [MgmSrvr] ALERT -- Node 4: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'. Now, I dod the same thing just minutes laters (parts of log): 2008-11-16 13:42:58 [MgmSrvr] INFO -- Node 3: Node shutdown completed. Initiated by signal 15. 2008-11-16 13:42:59 [MgmSrvr] INFO -- Node 2: Possible bug in Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 2 sig->failNo = 2008-11-16 13:42:59 [MgmSrvr] INFO -- Node 4: Possible bug in Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 2 sig->failNo = 2008-11-16 13:42:59 [MgmSrvr] INFO -- Node 5: Node shutdown completed. Initiated by signal 15. So, one time it has heartbeats failures, and it takes the other nodes with it. Later on signal 15 was recognized and others nodes didn't fail. Logs attached.
[18 Dec 2008 1:35]
Jonathan Miller
Hi, I am now hitting this pretty often in ACRT using NDBAtomics. Current byte-offset of file-pointer is: 568 Time: Thursday 18 December 2008 - 03:10:55 Status: Temporary error, restart node Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug) Error: 2303 Error data: Node 3 killed this node because GCP stop was detected Error object: NDBCNTR (Line: 255) 0x00000006 Program: /data0/cr_autotest/libexec/ndbd Pid: 25606 Trace: ./ndb_3_trace.log.1 Version: mysql-5.1.30 ndb-6.4.0-alpha ***EOM***
[24 Dec 2008 14:03]
Jonathan Miller
TPCC-DD result-2-dn-dd-mysql-5.1-telco-6.4/2008-12-24_01:41/result.1/ndbd.2/ndb_3_error.log Time: Wednesday 24 December 2008 - 04:05:04 Status: Temporary error, restart node Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug) Error: 2303 Error data: Node 3 killed this node because GCP stop was detected Error object: NDBCNTR (Line: 255) 0x00000006 Program: /data0/cr_autotest/libexec/ndbd Pid: 14075 Trace: ./ndb_3_trace.log.1 Version: mysql-5.1.30 ndb-6.4.1-beta ***EOM***
[24 Dec 2008 14:04]
Jonathan Miller
TPCC-DD result-2-dn-perf-mysql-5.1-telco-6.4/2008-12-24_02:12/result.6/ndbd.2/ndb_3_error.log Time: Wednesday 24 December 2008 - 06:48:34 Status: Temporary error, restart node Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug) Error: 2303 Error data: Node 3 killed this node because GCP stop was detected Error object: NDBCNTR (Line: 255) 0x00000006 Program: /data0/cr_autotest/libexec/ndbd Pid: 16250 Trace: ./ndb_3_trace.log.1 Version: mysql-5.1.30 ndb-6.4.1-beta
[24 Dec 2008 14:07]
Jonathan Miller
File: result-2-dn-perf-mysql-5.1-telco-6.4/2008-12-24_02:12/result.6/ndb_api.1/log.out + command_exec '/data0/cr_autotest/bin/mysql -h ndb18 -u root --port 15001 --protocol=tcp -e "SELECT LOGFILE_GROUP_NAME, LOGFILE_GROUP_NUMBER, EXTRA FROM INFORMATION_SCHEMA.FILES"' + '[' -n '' ']' + eval '/data0/cr_autotest/bin/mysql -h ndb18 -u root --port 15001 --protocol=tcp -e "SELECT LOGFILE_GROUP_NAME, LOGFILE_GROUP_NUMBER, EXTRA FROM INFORMATION_SCHEMA.FILES"' ++ /data0/cr_autotest/bin/mysql -h ndb18 -u root --port 15001 --protocol=tcp -e 'SELECT LOGFILE_GROUP_NAME, LOGFILE_GROUP_NUMBER, EXTRA FROM INFORMATION_SCHEMA.FILES' + rc=0 + '[' 0 -ne 0 ']' + command_exec '/data0/cr_autotest/bin/mysql -h ndb18 -u root --port 15001 --protocol=tcp -e "CREATE LOGFILE GROUP lg1 ADD UNDOFILE '\''undofile.dat'\'' initial_size 512M ENGINE=NDB"' + '[' -n '' ']' + eval '/data0/cr_autotest/bin/mysql -h ndb18 -u root --port 15001 --protocol=tcp -e "CREATE LOGFILE GROUP lg1 ADD UNDOFILE '\''undofile.dat'\'' initial_size 512M ENGINE=NDB"' ++ /data0/cr_autotest/bin/mysql -h ndb18 -u root --port 15001 --protocol=tcp -e 'CREATE LOGFILE GROUP lg1 ADD UNDOFILE '\''undofile.dat'\'' initial_size 512M ENGINE=NDB' + rc=0 + '[' 0 -ne 0 ']' + command_exec '/data0/cr_autotest/bin/mysql -h ndb18 -u root --port 15001 --protocol=tcp -e "CREATE TABLESPACE ts1 ADD DATAFILE '\''datafile.dat'\'' USE LOGFILE GROUP lg1 initial_size 3G ENGINE=NDB"' + '[' -n '' ']' + eval '/data0/cr_autotest/bin/mysql -h ndb18 -u root --port 15001 --protocol=tcp -e "CREATE TABLESPACE ts1 ADD DATAFILE '\''datafile.dat'\'' USE LOGFILE GROUP lg1 initial_size 3G ENGINE=NDB"' ++ /data0/cr_autotest/bin/mysql -h ndb18 -u root --port 15001 --protocol=tcp -e 'CREATE TABLESPACE ts1 ADD DATAFILE '\''datafile.dat'\'' USE LOGFILE GROUP lg1 initial_size 3G ENGINE=NDB' + rc=0 + '[' 0 -ne 0 ']' + '[' 1 -gt 1 ']' + '[' X '!=' X ']' + echo 'Creation of DISK DATA Complete' + exit 0 ERRORERROR 1297 (HY000) at line 1: Got temporary error 4025 'Node failure caused abort of transaction' from NDBCLUSTER 1297 (HY000) at line 1: Got temporary error 4025 'Node failure caused abort of transaction' from NDBCLUSTER
[24 Dec 2008 18:46]
Jonathan Miller
ndbatomics-dd result-2-dn-dd-mysql-5.1-telco-6.4/2008-12-24_16:49/result.1/ndbd.1/ndb_2_error.log 2008-12-24 19:44:27 [MgmSrvr] ALERT -- Node 3: Node 2 Disconnected 2008-12-24 19:44:27 [MgmSrvr] INFO -- Node 3: Communication to Node 2 closed 2008-12-24 19:44:27 [MgmSrvr] ALERT -- Node 3: Network partitioning - arbitration required 2008-12-24 19:44:27 [MgmSrvr] INFO -- Node 3: President restarts arbitration thread [state=7] 2008-12-24 19:44:27 [MgmSrvr] ALERT -- Node 1: Node 2 Disconnected 2008-12-24 19:44:27 [MgmSrvr] ALERT -- Node 3: Arbitration won - positive reply from node 1 2008-12-24 19:44:27 [MgmSrvr] INFO -- Node 3: GCP Take over started 2008-12-24 19:44:27 [MgmSrvr] INFO -- Node 3: Node 3 taking over as DICT master 2008-12-24 19:44:27 [MgmSrvr] INFO -- Node 3: GCP Take over completed 2008-12-24 19:44:27 [MgmSrvr] INFO -- Node 3: kk: 976/9 2 0 2008-12-24 19:44:27 [MgmSrvr] ALERT -- Node 2: Forced node shutdown completed. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. 2008-12-24 19:44:27 [MgmSrvr] ALERT -- Node 1: Node 2 Disconnected 2008-12-24 19:44:27 [MgmSrvr] INFO -- Node 3: Started arbitrator node 1 [ticket=52ea00024f0c15d4] 2008-12-24 19:44:28 [MgmSrvr] ALERT -- Node 3: Node 5 Disconnected 2008-12-24 19:44:28 [MgmSrvr] INFO -- Node 3: Communication to Node 5 closed 2008-12-24 19:44:28 [MgmSrvr] INFO -- Mgmt server state: nodeid 5 freed, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000012. Time: Wednesday 24 December 2008 - 19:44:27 Status: Temporary error, restart node Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug) Error: 2303 Error data: Node 2 killed this node because GCP stop was detected Error object: NDBCNTR (Line: 255) 0x00000006 Program: /data0/cr_autotest/libexec/ndbd Pid: 24187 Trace: ./ndb_2_trace.log.1 Version: mysql-5.1.30 ndb-6.4.1-beta ***EOM***
[7 Jan 2009 22:41]
Jonathan Miller
Current byte-offset of file-pointer is: 568 Time: Wednesday 7 January 2009 - 11:42:34 Status: Temporary error, restart node Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug) Error: 2303 Error data: Node 3 killed this node because GCP stop was detected Error object: NDBCNTR (Line: 255) 0x00000006 Program: /data0/cr_autotest/libexec/ndbmtd Pid: 22106 Trace: ./ndb_3_trace.log.1 ./ndb_3_trace.log.1_t1 ./ndb_3_trace.log.1_t2 ./ndb_3_trace.log
[7 Jan 2009 22:43]
Jonathan Miller
ndb_3_trace.log.1
DBTC 004430
DBTC 000190 000222 004850 004850 004850 004850 004850 004850
004850 004850 004850 004850 004850 004850 004850 004850
004850 004853
DBTC 004430
DBTC 000190 000222 004850 004850 004850 004850 004850 004850
004850 004850 004873
DBTC 004430
DBTC 004430
QMGR 003059 003197
DBTC 004430
DBTC 004430 016016 013076
NDBCNTR 000210 000255
--------------- Signal ----------------
r.bn: 246 "DBDIH", r.proc: 3, r.sigId: 114006583 gsn: 465 "DUMP_STATE_ORD" prio: 0
s.bn: 246 "DBDIH", s.proc: 2, s.sigId: 101194834 length: 1 trace: 0 #sec: 0 fragInf: 0
H'00001b6e
--------------- Signal ----------------
r.bn: 252 "QMGR", r.proc: 3, r.sigId: 114006582 gsn: 3 "API_REGREQ" prio: 0
s.bn: 4002 "API", s.proc: 1, s.sigId: 0 length: 3 trace: 0 #sec: 0 fragInf: 0
H'0fa20001 H'00060401 H'0005011e
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 114006581 gsn: 164 "CONTINUEB" prio: 1
s.bn: 245 "DBTC", s.proc: 3, s.sigId: 114006580 length: 3 trace: 1 #sec: 0 fragInf: 0
H'00000006 H'0000003b H'0003db5b
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 114006580 gsn: 164 "CONTINUEB" prio: 1
s.bn: 245 "DBTC", s.proc: 3, s.sigId: 114006579 length: 3 trace: 1 #sec: 0 fragInf: 0
H'00000006 H'0000003b H'0003da7f
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 114006579 gsn: 164 "CONTINUEB" prio: 1
s.bn: 245 "DBTC", s.proc: 3, s.sigId: 114006577 length: 3 trace: 1 #sec: 0 fragInf: 0
H'00000006 H'0000003b H'0003dcc5
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 3, r.sigId: 114006578 gsn: 160 "COMPLETECONF" prio: 1
s.bn: 247/1 "DBLQH", s.proc: 3, s.sigId: 336796444 length: 4 trace: 1 #sec: 0 fragInf: 0
H'0003e002 H'00000003 H'0001f751 H'00300500
--------------- Signal ----------------
[7 Jan 2009 22:44]
Jonathan Miller
ndb_3_trace.log.1_t1 LGMAN 001440 001457 001461 001467 TSMAN 001937 000454 PGMAN 001600 000153 000153 LGMAN 001797 --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 883179 gsn: 761 "STOP_FOR_CRASH" prio: 0 s.bn: 0 "SYS", s.proc: 0, s.sigId: 0 length: 1 trace: 0 #sec: 0 fragInf: 0 H'00000000 --------------- Signal ---------------- r.bn: 260 "LGMAN", r.proc: 3, r.sigId: 883178 gsn: 270 "FSWRITECONF" prio: 1 s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 114006521 length: 1 trace: 0 #sec: 0 fragInf: 0 UserPointer: -2145861632 --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 883177 gsn: 409 "TIME_SIGNAL" prio: 1 s.bn: 252 "QMGR", s.proc: 3, s.sigId: 114006520 length: 1 trace: 0 #sec: 0 fragInf: 0 H'00000004 --------------- Signal ---------------- r.bn: 260 "LGMAN", r.proc: 3, r.sigId: 883176 gsn: 164 "CONTINUEB" prio: 1 s.bn: 260 "LGMAN", s.proc: 3, s.sigId: 883175 length: 3 trace: 0 #sec: 0 fragInf: 0 H'00000002 H'1890c000 H'00000000 --------------- Signal ---------------- r.bn: 260 "LGMAN", r.proc: 3, r.sigId: 883175 gsn: 164 "CONTINUEB" prio: 0 s.bn: 260 "LGMAN", s.proc: 3, s.sigId: 883173 length: 3 trace: 0 #sec: 0 fragInf: 0 H'00000002 H'1890c000 H'00000001
[7 Jan 2009 22:45]
Jonathan Miller
ndb_3_trace.log.1_t2 DBLQH 002801 DBTUP 010029 DBLQH 002801 DBTUP 010029 DBLQH 002801 DBTUP 010029 --------------- Signal ---------------- r.bn: 247/1 "DBLQH", r.proc: 3, r.sigId: 336796598 gsn: 761 "STOP_FOR_CRASH" prio: 0 s.bn: 0 "SYS", s.proc: 0, s.sigId: 0 length: 1 trace: 0 #sec: 0 fragInf: 0 H'00000000 --------------- Signal ---------------- r.bn: 248/1 "DBACC", r.proc: 3, r.sigId: 336796597 gsn: 253 "EXPANDCHECK2" prio: 1 s.bn: 248/1 "DBACC", s.proc: 3, s.sigId: 336796596 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000016 H'00001ae2 H'00003fff --------------- Signal ---------------- r.bn: 248/1 "DBACC", r.proc: 3, r.sigId: 336796596 gsn: 253 "EXPANDCHECK2" prio: 1 s.bn: 248/1 "DBACC", s.proc: 3, s.sigId: 336796595 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000016 H'00001ae1 H'00003fff --------------- Signal ---------------- r.bn: 248/1 "DBACC", r.proc: 3, r.sigId: 336796595 gsn: 253 "EXPANDCHECK2" prio: 1 s.bn: 248/1 "DBACC", s.proc: 3, s.sigId: 336796594 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000016 H'00001ae0 H'00003fff --------------- Signal ---------------- r.bn: 248/1 "DBACC", r.proc: 3, r.sigId: 336796594 gsn: 253 "EXPANDCHECK2" prio: 1 s.bn: 248/1 "DBACC", s.proc: 3, s.sigId: 336796593 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000016 H'00001adf H'00003fff --------------- Signal ---------------- r.bn: 248/1 "DBACC", r.proc: 3, r.sigId: 336796593 gsn: 253 "EXPANDCHECK2" prio: 1 s.bn: 248/1 "DBACC", s.proc: 3, s.sigId: 336796592 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000016 H'00001ade H'00003fff --------------- Signal ---------------- r.bn: 248/1 "DBACC", r.proc: 3, r.sigId: 336796592 gsn: 253 "EXPANDCHECK2" prio: 1 s.bn: 248/1 "DBACC", s.proc: 3, s.sigId: 336796527 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000016 H'00001add H'00003fff --------------- Signal ---------------- r.bn: 247/3 "DBLQH", r.proc: 3, r.sigId: 336796591 gsn: 152 "COMMIT" prio: 1 s.bn: 247/1 "DBLQH", s.proc: 2, s.sigId: 242794995 length: 5 trace: 1 #sec: 0 fragInf: 0 H'0000011d H'00000a01 H'0001f780 H'00300500 H'00000001 --------------- Signal ---------------- r.bn: 247/3 "DBLQH", r.proc: 3, r.sigId: 336796590 gsn: 152 "COMMIT" prio: 1 s.bn: 247/1 "DBLQH", s.proc: 2, s.sigId: 242794994 length: 5 trace: 1 #sec: 0 fragInf: 0 H'0000015e H'00000a01 H'0001f780 H'00300500 H'00000001
[7 Jan 2009 22:46]
Jonathan Miller
ndb_3_trace.log.1_t3 DBTUP 065233 065247 DBACC 000081 000119 DBLQH 002801 DBTUP 010029 DBLQH 002801 DBTUP 010029 --------------- Signal ---------------- r.bn: 247/2 "DBLQH", r.proc: 3, r.sigId: 286619747 gsn: 761 "STOP_FOR_CRASH" prio: 0 s.bn: 0 "SYS", s.proc: 0, s.sigId: 0 length: 1 trace: 0 #sec: 0 fragInf: 0 H'00000000 --------------- Signal ---------------- r.bn: 248/2 "DBACC", r.proc: 3, r.sigId: 286619746 gsn: 164 "CONTINUEB" prio: 0 s.bn: 248/2 "DBACC", s.proc: 3, s.sigId: 285196946 length: 2 trace: 0 #sec: 0 fragInf: 0 H'00000008 H'00000014 --------------- Signal ---------------- r.bn: 249/2 "DBTUP", r.proc: 3, r.sigId: 286619745 gsn: 164 "CONTINUEB" prio: 0 s.bn: 249/2 "DBTUP", s.proc: 3, s.sigId: 285196945 length: 2 trace: 0 #sec: 0 fragInf: 0 H'00000008 H'00000014 --------------- Signal ---------------- r.bn: 248/2 "DBACC", r.proc: 3, r.sigId: 286619744 gsn: 253 "EXPANDCHECK2" prio: 1 s.bn: 248/2 "DBACC", s.proc: 3, s.sigId: 286619743 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000016 H'00001ab7 H'00003fff --------------- Signal ---------------- r.bn: 248/2 "DBACC", r.proc: 3, r.sigId: 286619743 gsn: 253 "EXPANDCHECK2" prio: 1 s.bn: 248/2 "DBACC", s.proc: 3, s.sigId: 286619742 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000016 H'00001ab6 H'00003fff --------------- Signal ---------------- r.bn: 248/2 "DBACC", r.proc: 3, r.sigId: 286619742 gsn: 253 "EXPANDCHECK2" prio: 1 s.bn: 248/2 "DBACC", s.proc: 3, s.sigId: 286619741 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000016 H'00001ab5 H'00003fff --------------- Signal ---------------- r.bn: 248/2 "DBACC", r.proc: 3, r.sigId: 286619741 gsn: 253 "EXPANDCHECK2" prio: 1 s.bn: 248/2 "DBACC", s.proc: 3, s.sigId: 286619740 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000016 H'00001ab4 H'00003fff --------------- Signal ---------------- r.bn: 248/2 "DBACC", r.proc: 3, r.sigId: 286619740 gsn: 253 "EXPANDCHECK2" prio: 1 s.bn: 248/2 "DBACC", s.proc: 3, s.sigId: 286619739 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000016 H'00001ab3 H'00003fff --------------- Signal ----------------
[7 Jan 2009 22:47]
Jonathan Miller
ndb_3_trace.log.1_t4 CMVMI 000238 CMVMI 000238 CMVMI 000238 CMVMI 000238 CMVMI 000238 CMVMI 000238 CMVMI 000238 --------------- Signal ---------------- r.bn: 254 "CMVMI", r.proc: 3, r.sigId: 17526 gsn: 761 "STOP_FOR_CRASH" prio: 0 s.bn: 0 "SYS", s.proc: 0, s.sigId: 0 length: 1 trace: 0 #sec: 0 fragInf: 0 H'00000000 --------------- Signal ---------------- r.bn: 254 "CMVMI", r.proc: 3, r.sigId: 17525 gsn: 247 "EVENT_REP" prio: 0 s.bn: 0 "SYS", s.proc: 3, s.sigId: 0 length: 3 trace: 0 #sec: 0 fragInf: 0 H'00000029 H'00000002 H'000015b4 --------------- Signal ---------------- r.bn: 254 "CMVMI", r.proc: 3, r.sigId: 17524 gsn: 247 "EVENT_REP" prio: 1 s.bn: 0 "SYS", s.proc: 0, s.sigId: 0 length: 6 trace: 0 #sec: 0 fragInf: 0 H'00000046 H'00000003 H'0000005a H'0000048f H'001e8427 H'00f42345 --------------- Signal ---------------- r.bn: 254 "CMVMI", r.proc: 3, r.sigId: 17523 gsn: 247 "EVENT_REP" prio: 1 s.bn: 0 "SYS", s.proc: 0, s.sigId: 0 length: 6 trace: 0 #sec: 0 fragInf: 0 H'00000046 H'00000002 H'00000010 H'00000083 H'001e8471 H'00f42581 --------------- Signal ----------------
[16 Jan 2009 23:04]
Jonathan Miller
Current byte-offset of file-pointer is: 568 Time: Wednesday 14 January 2009 - 11:11:46 Status: Temporary error, restart node Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug) Error: 2303 Error data: Node 3 killed this node because GCP stop was detected Error object: NDBCNTR (Line: 255) 0x00000006 Program: /data0/cr_autotest/libexec/ndbd Pid: 29614 Trace: ./ndb_3_trace.log.1 Version: mysql-5.1.30 ndb-6.4.1-beta ***EOM***
[2 Feb 2009 15:13]
Jonathan Miller
Hitting GCP STOP is easier in 6.4 if you use less then 8 data file for Disk Data. This is due to the new readers thread parameter "ThreadPool". (default = 8). If you are hitting "GCP STOP" in 6.4, you should adjust the "ThreadPool" to be = to the # of disk data files you have.
[2 Feb 2009 15:29]
Jon Stephens
Tomas asked me to look into this for discussion with Jonas and documenting known issue(s) - I've put bug in Documenting status (at least for the time being) so it'll show up in my work queue and I don't forget.
[9 Feb 2009 13:34]
Jon Stephens
Added info about GCP stop issues to docs, will appear at http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#mysql-cluster-nd... and http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#mysql-cluster-nd... Also documented ThreadPool parameter (which wasn't previously documented) Commit to docs is here: http://lists.mysql.com/commits/65603 Docs work for now being finished, I've returned this bug to Verified status for further handling by NDB devs/Support/QA.
[25 Mar 2009 12:06]
Jonathan Miller
please also see http://bugs.mysql.com/bug.php?id=41292
[25 Mar 2009 12:08]
Jonathan Miller
see also http://bugs.mysql.com/bug.php?id=43069
[16 Apr 2009 15:26]
Jonathan Miller
http://bugs.mysql.com/bug.php?id=43882 http://bugs.mysql.com/bug.php?id=39498
