Bug #50534 | First datanode crashes when creating a new nodegroup | ||
---|---|---|---|
Submitted: | 22 Jan 2010 7:48 | Modified: | 27 Jan 2010 7:45 |
Reporter: | Oli Sennhauser | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | mysql-5.1-telco-7.0 | OS: | Any (Linux) |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
Tags: | 7.0.9, cluster, crash, datanode, MySQL, nodegroup |
[22 Jan 2010 7:48]
Oli Sennhauser
[22 Jan 2010 7:51]
Oli Sennhauser
Error log of crash
Attachment: ndb_error_report_20100122085004.tar.bz2 (application/x-redhat-package-manager, text), 41.35 KiB.
[22 Jan 2010 7:53]
Oli Sennhauser
ndb_mgm> CREATE NODEGROUP 3,4 * 322: Error * 322-Invalid node(s) specified for new nodegroup, node already in nodegroup: Permanent error: Application error ndb_mgm> create nodegroup 5,6 Node 3: Forced node shutdown completed. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. * -1: Error * -1-Unknown error code: Unknown result: Unknown error code ---- Time: Friday 22 January 2010 - 08:34:40 Status: Temporary error, restart node Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug) Error: 2303 Error data: Node 3 killed this node because GCP stop was detected Error object: NDBCNTR (Line: 270) 0x0000000a Program: ndbd Pid: 7173 Version: mysql-5.1.39 ndb-7.0.9b Trace: /home/mysql/cluster/7.0.9/ndb_3_trace.log.2 ***EOM*** ---- DBDIH 000338 000532 013341 013357 013624 NDBCNTR 000214 016621 NDBCNTR 000224 LGMAN 000351 NDBCNTR 000231 000270 --------------- Signal ---------------- r.bn: 246 "DBDIH", r.proc: 3, r.sigId: 1856691 gsn: 164 "CONTINUEB" prio: 0 s.bn: 246 "DBDIH", s.proc: 3, s.sigId: 1856687 length: 1 trace: 8 #sec: 0 fragInf: 0 Check GCP Stop --------------- Signal ---------------- r.bn: 246 "DBDIH", r.proc: 3, r.sigId: 1856690 gsn: 164 "CONTINUEB" prio: 0 s.bn: 246 "DBDIH", s.proc: 3, s.sigId: 1856686 length: 1 trace: 2 #sec: 0 fragInf: 0 Start GCP --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 1856689 gsn: 164 "CONTINUEB" prio: 0 s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 1856685 length: 1 trace: 0 #sec: 0 fragInf: 0 Scanning the memory channel every 10ms --------------- Signal ---------------- r.bn: 252 "QMGR", r.proc: 3, r.sigId: 1856688 gsn: 164 "CONTINUEB" prio: 0 s.bn: 252 "QMGR", s.proc: 3, s.sigId: 1856684 length: 3 trace: 0 #sec: 0 fragInf: 0 H'00000004 H'00000000 H'0061e1cc --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 1856683 gsn: 409 "TIME_SIGNAL" prio: 1 s.bn: 252 "QMGR", s.proc: 3, s.sigId: 1856679 length: 1 trace: 0 #sec: 0 fragInf: 0 H'00000004
[22 Jan 2010 8:04]
Oli Sennhauser
NDBCNTR: 204 /*******************************/ 205 /* SYSTEM_ERROR */ 206 /*******************************/ 207 void Ndbcntr::execSYSTEM_ERROR(Signal* signal) 208 { ... 215 switch (sysErr->errorCode){ 216 case SystemError::GCPStopDetected: 217 { 218 BaseString::snprintf(buf, sizeof(buf), 219 "Node %d killed this node because " 220 "GCP stop was detected", 221 killingNode); 222 signal->theData[0] = 7025; 223 EXECUTE_DIRECT(DBDIH, GSN_DUMP_STATE_ORD, signal, 1); 224 jamEntry(); 225 226 { 227 signal->theData[0] = 12002; 228 EXECUTE_DIRECT(LGMAN, GSN_DUMP_STATE_ORD, signal, 1, 0); 229 } 230 231 jamEntry(); 232 break; 233 } ---- LGMAN: 349 void 350 Lgman::execDUMP_STATE_ORD(Signal* signal){ 351 jamEntry(); 352 if (signal->theData[0] == 12001 || signal->theData[0] == 12002) 353 { ...
[26 Jan 2010 13:15]
Oli Sennhauser
With 7.0.7 not even step 1 works: shell> ndb_mgmd -f config.ini --configdir=/home/mysql/cluster/7.0.7 2010-01-26 14:12:17 [MgmSrvr] INFO -- NDB Cluster Management Server. mysql-5.1.35 ndb-7.0.7 2010-01-26 14:12:17 [MgmSrvr] INFO -- Reading cluster configuration from 'config.ini' shell> ndb_mgm -- NDB Cluster -- Management Client -- ndb_mgm> show Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 4 node(s) id=10 (not connected, accepting connect from localhost) id=20 (not connected, accepting connect from localhost) id=30 (not connected, accepting connect from localhost) id=40 (not connected, accepting connect from localhost) [ndb_mgmd(MGM)] 1 node(s) id=2 @localhost (mysql-5.1.35 ndb-7.0.7) trying 7.0.8 now...
[26 Jan 2010 13:23]
Jonas Oreland
this is a regression, not entirely sure when it was introduced. and out testcases for it was disabled :-( patch will fix problem, and re enable the testing.
[26 Jan 2010 14:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/98195 3369 Jonas Oreland 2010-01-26 ndb - bug#50534 - fix regression in create/drop nodegroup, and make sure that it's properly tested
[26 Jan 2010 14:08]
Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20100126140723-ec25q36v55cw5awp) (version source revid:jonas@mysql.com-20100126140352-0ld0q4gk0yc8wh7v) (merge vers: 5.1.41-ndb-7.0.11) (pib:16)
[26 Jan 2010 14:12]
Jonas Oreland
pushed into 7.0.11
[27 Jan 2010 7:45]
Jon Stephens
Documented in the NDB-7.0.11 changelog as follows: CREATE NODEGROUP could sometimes cause a data node forced shutdown. Closed.