Bug #43601 | on-line add node test causes file system inconsistency error DBLQH (Line: 16380) | ||
---|---|---|---|
Submitted: | 12 Mar 2009 15:38 | Modified: | 11 Aug 2009 10:55 |
Reporter: | Jonathan Miller | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | mysql-5.1-telco-7.0 | OS: | Linux |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[12 Mar 2009 15:38]
Jonathan Miller
[12 Mar 2009 15:48]
Jonathan Miller
DBLQH -> 017402 016025 016092 016025 016092 016025 016092 016144 018603 016162 017737 017402 016025 016092 016144 018603 016162 017737 017737 017402 016025 016092 016263 016380 --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 9164455 gsn: 164 "CONTINUEB" prio: 1 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 9164453 length: 2 trace: 2 #sec: 0 fragInf: 0 H'00000006 H'00000000 --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 9164454 gsn: 164 "CONTINUEB" prio: 1 s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 9164452 length: 1 trace: 1 #sec: 0 fragInf: 0 Scanning the memory channel again with no delay --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 9164453 gsn: 262 "FSREADCONF" prio: 1 s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 9164452 length: 1 trace: 2 #sec: 0 fragInf: 0 UserPointer: 3 --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 9164452 gsn: 164 "CONTINUEB" prio: 1 s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 9164450 length: 1 trace: 1 #sec: 0 fragInf: 0 Scanning the memory channel again with no delay --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 9164451 gsn: 255 "FSCLOSECONF" prio: 1 s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 9164450 length: 1 trace: 2 #sec: 0 fragInf: 0 UserPointer: 20 --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 9164450 gsn: 164 "CONTINUEB" prio: 0 s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 9164448 length: 1 trace: 1 #sec: 0 fragInf: 0 Scanning the memory channel every 10ms --------------- Signal ---------------- r.bn: 252 "QMGR", r.proc: 2, r.sigId: 9164449 gsn: 164 "CONTINUEB" prio: 0 s.bn: 252 "QMGR", s.proc: 2, s.sigId: 9164447 length: 1 trace: 0 #sec: 0 fragInf: 0 H'00000004 --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 9164446 gsn: 264 "FSREADREQ" prio: 0 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 9164445 length: 15 trace: 2 #sec: 0 fragInf: 0 UserPointer: 3 FilePointer: 77 UserReference: H'00f70002 Operation flag: H'00000002 (No sync, Format=List of mem pages) varIndex: 1 numberOfPages: 8 pageData: H'0000002f, H'0000002e, H'0000002d, H'0000002c, H'0000002b, H'0000002a, H'00000029, H'00000028, H'000009d8, --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 9164445 gsn: 164 "CONTINUEB" prio: 1 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 9164443 length: 2 trace: 2 #sec: 0 fragInf: 0 H'00000006 H'00000000 --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 9164444 gsn: 164 "CONTINUEB" prio: 1 s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 9164440 length: 1 trace: 1 #sec: 0 fragInf: 0 Scanning the memory channel again with no delay --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 9164443 gsn: 262 "FSREADCONF" prio: 1 s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 9164440 length: 1 trace: 2 #sec: 0 fragInf: 0 UserPointer: 3 --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 9164442 gsn: 257 "FSCLOSEREQ" prio: 0 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 9164441 length: 5 trace: 2 #sec: 0 fragInf: 0 UserPointer: 20 FilePointer: 79 UserReference: H'00f70002 Flags: H'00000000, Don't remove file H'0000004f H'00f70002 H'00000014 H'00000000 H'00003ec8 --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 9164441 gsn: 164 "CONTINUEB" prio: 1 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 9164439 length: 2 trace: 2 #sec: 0 fragInf: 0 H'00000006 H'00000002 --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 9164440 gsn: 164 "CONTINUEB" prio: 1 s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 9164438 length: 1 trace: 1 #sec: 0 fragInf: 0 Scanning the memory channel again with no delay --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 9164439 gsn: 262 "FSREADCONF" prio: 1 s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 9164438 length: 1 trace: 2 #sec: 0 fragInf: 0 UserPointer: 0 --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 2, r.sigId: 9164438 gsn: 164 "CONTINUEB" prio: 0 s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 9164436 length: 1 trace: 1 #sec: 0 fragInf: 0 Scanning the memory channel every 10ms
[12 Mar 2009 15:53]
Jonathan Miller
Note: This happened during the restart of original data nodes after configuration had been updated to include data nodes that where to be added.
[12 Mar 2009 19:03]
Jonathan Miller
It's reproducable: Time: Thursday 12 March 2009 - 19:30:31 Status: Ndbd file system error, restart node initial Message: Error while reading the REDO log (Ndbd file system inconsistency error, please report a bug) Error: 2310 Error data: Error while reading REDO log. from 16303 D=8, F=0 Mb=78 FP=2527 W1=8188 W2=7 : end of log wo/ having found last GCI gci: 2698 Error object: DBLQH (Line: 16410) 0x00000002 ------ Database DB1 is Created. Creating t1, t2, t3, please stand by.... Table t1 Created!!!!! Table t2 Created!!!!! Table t3 Created!!!!! row num is 1000000,1000000,1000000 Load data. Insert data to table. Disconnected from MySQLD Script -> All the data has been load to table t1! Script -> All the data has been load to table t2! Script -> All the data has been load to table t3! Script -> Get unused node id for new ndbd node id for the first added "new" ndbd is 9 node id for the second added "new" ndbd is 10 Script -> Stopping ndb_mgmd Connected to Management Server at: ndb13:15000 Node 1 has shutdown. Disconnecting to allow Management Server to shutdown Script -> Maximum time for stopping ndb_mgmd is 60 seconds, otherwise the test will fail Script -> ndb_mgmd has stoped! Script -> Editing the config file Config file my.cnf exists! Script -> New config file has generated Script -> Starting ndb_mgmd 2009-03-12 19:28:27 [MgmSrvr] INFO -- NDB Cluster Management Server. mysql-5.1.32 ndb-7.0.4-beta 2009-03-12 19:28:27 [MgmSrvr] INFO -- Loaded config from '/data0/cr_autotest/run-2-dn-func-mysql-5.1-telco-6.4/run/ndb_mgmd.1/ndb_1_config.bin.1' Script -> Check if ndb_mgmd has started and the old ndbd has connected Node 2 has started! Node 3 has started! Connected to Management Server at: ndb13:15000 Cluster Configuration --------------------- [ndbd(NDB)] 4 node(s) id=2 @10.128.22.113 (mysql-5.1.32 ndb-7.0.4, Nodegroup: 0, Master) id=3 @10.128.22.114 (mysql-5.1.32 ndb-7.0.4, Nodegroup: 0) id=9 (not connected, accepting connect from ndb13) id=10 (not connected, accepting connect from ndb14) [ndb_mgmd(MGM)] 1 node(s) id=1 @10.128.22.113 (mysql-5.1.32 ndb-7.0.4) [mysqld(API)] 5 node(s) id=4 @10.128.22.113 (mysql-5.1.32 ndb-7.0.4) id=5 @10.128.22.113 (mysql-5.1.32 ndb-7.0.4) id=6 @10.128.22.113 (mysql-5.1.32 ndb-7.0.4) id=7 (not connected, accepting connect from ndb13) id=8 (not connected, accepting connect from ndb13) Script -> Restarting original data nodes Script -> Maximum time for restarting each node is 800 seconds, otherwise the test will fail Script -> Restarting the first original data node with node id 2 Connected to Management Server at: ndb13:15000 Node 2 is being restarted Script -> Node 2 (re)start failed since timeout! =========================================== Thu Mar 12 19:43:34 CET 2009 =========================================== =========================================== Thu Mar 12 19:43:34 CET 2009 =========================================== NDBT_ProgramExit: Failed NDBT_ProgramExit: Failed Script -> FAILED!!: /space/cluster_rep_auto/tests/online-add-node/online-add-node.sh --run=/data0/cr_autotest/run-2-dn-func-mysql-5.1-telco-6.4/run --conf=/space/cluster_rep_auto/cr-autotest.conf --log=/space/cluster_rep_auto/logs/online-add-node.log --csv=/space/cluster_rep_auto/logs/online-add-node.csv --pk=/space/cluster_rep_auto/logs/online-add-node.pk --row-num=1000000,1000000,1000000 Program: /data0/cr_autotest/libexec/ndbd Pid: 28961 Trace: ./ndb_2_trace.log.1 Version: mysql-5.1.32 ndb-7.0
[11 Aug 2009 10:55]
Jonas Oreland
this is a duplicate of bug#44952