| Bug #50534 | First datanode crashes when creating a new nodegroup | ||
|---|---|---|---|
| Submitted: | 22 Jan 2010 7:48 | Modified: | 27 Jan 2010 7:45 |
| Reporter: | Oli Sennhauser | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
| Version: | mysql-5.1-telco-7.0 | OS: | Any (Linux) |
| Assigned to: | Jonas Oreland | CPU Architecture: | Any |
| Tags: | 7.0.9, cluster, crash, datanode, MySQL, nodegroup | ||
[22 Jan 2010 7:48]
Oli Sennhauser
[22 Jan 2010 7:51]
Oli Sennhauser
Error log of crash
Attachment: ndb_error_report_20100122085004.tar.bz2 (application/x-redhat-package-manager, text), 41.35 KiB.
[22 Jan 2010 7:53]
Oli Sennhauser
ndb_mgm> CREATE NODEGROUP 3,4 * 322: Error * 322-Invalid node(s) specified for new nodegroup, node already in nodegroup: Permanent error: Application error ndb_mgm> create nodegroup 5,6 Node 3: Forced node shutdown completed. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. * -1: Error * -1-Unknown error code: Unknown result: Unknown error code ---- Time: Friday 22 January 2010 - 08:34:40 Status: Temporary error, restart node Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug) Error: 2303 Error data: Node 3 killed this node because GCP stop was detected Error object: NDBCNTR (Line: 270) 0x0000000a Program: ndbd Pid: 7173 Version: mysql-5.1.39 ndb-7.0.9b Trace: /home/mysql/cluster/7.0.9/ndb_3_trace.log.2 ***EOM*** ---- DBDIH 000338 000532 013341 013357 013624 NDBCNTR 000214 016621 NDBCNTR 000224 LGMAN 000351 NDBCNTR 000231 000270 --------------- Signal ---------------- r.bn: 246 "DBDIH", r.proc: 3, r.sigId: 1856691 gsn: 164 "CONTINUEB" prio: 0 s.bn: 246 "DBDIH", s.proc: 3, s.sigId: 1856687 length: 1 trace: 8 #sec: 0 fragInf: 0 Check GCP Stop --------------- Signal ---------------- r.bn: 246 "DBDIH", r.proc: 3, r.sigId: 1856690 gsn: 164 "CONTINUEB" prio: 0 s.bn: 246 "DBDIH", s.proc: 3, s.sigId: 1856686 length: 1 trace: 2 #sec: 0 fragInf: 0 Start GCP --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 1856689 gsn: 164 "CONTINUEB" prio: 0 s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 1856685 length: 1 trace: 0 #sec: 0 fragInf: 0 Scanning the memory channel every 10ms --------------- Signal ---------------- r.bn: 252 "QMGR", r.proc: 3, r.sigId: 1856688 gsn: 164 "CONTINUEB" prio: 0 s.bn: 252 "QMGR", s.proc: 3, s.sigId: 1856684 length: 3 trace: 0 #sec: 0 fragInf: 0 H'00000004 H'00000000 H'0061e1cc --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 1856683 gsn: 409 "TIME_SIGNAL" prio: 1 s.bn: 252 "QMGR", s.proc: 3, s.sigId: 1856679 length: 1 trace: 0 #sec: 0 fragInf: 0 H'00000004
[22 Jan 2010 8:04]
Oli Sennhauser
NDBCNTR:
204 /*******************************/
205 /* SYSTEM_ERROR */
206 /*******************************/
207 void Ndbcntr::execSYSTEM_ERROR(Signal* signal)
208 {
...
215 switch (sysErr->errorCode){
216 case SystemError::GCPStopDetected:
217 {
218 BaseString::snprintf(buf, sizeof(buf),
219 "Node %d killed this node because "
220 "GCP stop was detected",
221 killingNode);
222 signal->theData[0] = 7025;
223 EXECUTE_DIRECT(DBDIH, GSN_DUMP_STATE_ORD, signal, 1);
224 jamEntry();
225
226 {
227 signal->theData[0] = 12002;
228 EXECUTE_DIRECT(LGMAN, GSN_DUMP_STATE_ORD, signal, 1, 0);
229 }
230
231 jamEntry();
232 break;
233 }
----
LGMAN:
349 void
350 Lgman::execDUMP_STATE_ORD(Signal* signal){
351 jamEntry();
352 if (signal->theData[0] == 12001 || signal->theData[0] == 12002)
353 {
...
[26 Jan 2010 13:15]
Oli Sennhauser
With 7.0.7 not even step 1 works:
shell> ndb_mgmd -f config.ini --configdir=/home/mysql/cluster/7.0.7
2010-01-26 14:12:17 [MgmSrvr] INFO -- NDB Cluster Management Server. mysql-5.1.35 ndb-7.0.7
2010-01-26 14:12:17 [MgmSrvr] INFO -- Reading cluster configuration from 'config.ini'
shell> ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 4 node(s)
id=10 (not connected, accepting connect from localhost)
id=20 (not connected, accepting connect from localhost)
id=30 (not connected, accepting connect from localhost)
id=40 (not connected, accepting connect from localhost)
[ndb_mgmd(MGM)] 1 node(s)
id=2 @localhost (mysql-5.1.35 ndb-7.0.7)
trying 7.0.8 now...
[26 Jan 2010 13:23]
Jonas Oreland
this is a regression, not entirely sure when it was introduced. and out testcases for it was disabled :-( patch will fix problem, and re enable the testing.
[26 Jan 2010 14:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/98195 3369 Jonas Oreland 2010-01-26 ndb - bug#50534 - fix regression in create/drop nodegroup, and make sure that it's properly tested
[26 Jan 2010 14:08]
Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20100126140723-ec25q36v55cw5awp) (version source revid:jonas@mysql.com-20100126140352-0ld0q4gk0yc8wh7v) (merge vers: 5.1.41-ndb-7.0.11) (pib:16)
[26 Jan 2010 14:12]
Jonas Oreland
pushed into 7.0.11
[27 Jan 2010 7:45]
Jon Stephens
Documented in the NDB-7.0.11 changelog as follows:
CREATE NODEGROUP could sometimes cause a data node forced
shutdown.
Closed.
