Bug #74627 | Data Node crash and can not startup, can start use ndbd --initial | ||
---|---|---|---|
Submitted: | 30 Oct 2014 3:56 | Modified: | 3 Nov 2015 17:51 |
Reporter: | ray liu | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | NDB 7.3.5 | OS: | Linux (CentOS release 6.5 (Final)) |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
Tags: | crash |
[30 Oct 2014 3:56]
ray liu
[30 Oct 2014 3:57]
ray liu
Crashed data node start Log: [root@AY14062610010149880bZ ~]# ndbd 2014-10-28 16:30:51 [ndbd] INFO -- Angel connected to '10.163.239.170:1186' 2014-10-28 16:30:51 [ndbd] INFO -- Angel allocated nodeid: 3 Manage Node report: ndb_mgm> Node 3: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 11. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. [root@AY14062610010149880bZ ~]# tail -f /var/lib/mysql-cluster/ndb_3_out.log 2014-10-28 16:30:51 [ndbd] INFO -- Angel pid: 7601 started child: 7602 2014-10-28 16:30:51 [ndbd] INFO -- Configuration fetched from '10.163.239.170:1186', generation: 1 NDBMT: non-mt 2014-10-28 16:30:51 [ndbd] INFO -- NDB Cluster -- DB node 3 2014-10-28 16:30:51 [ndbd] INFO -- mysql-5.6.17 ndb-7.3.5 -- 2014-10-28 16:30:51 [ndbd] INFO -- numa_set_interleave_mask(numa_all_nodes) : OK 2014-10-28 16:30:51 [ndbd] INFO -- Ndbd_mem_manager::init(1) min: 9136Mb initial: 9264Mb Adding 40Mb to ZONE_LO (1,1279) Instantiating DBSPJ instanceNo=0 Successfully set real-time prio on tid = 7604 Started thread, index = 0, id = 7604, type = SocketClientThread Successfully set real-time prio on tid = 7603 Started thread, index = 1, id = 7603, type = WatchDogThread Successfully set real-time prio on tid = 7605 Started thread, index = 2, id = 7605, type = SocketServerThread 2014-10-28 16:30:51 [ndbd] INFO -- Start initiated (mysql-5.6.17 ndb-7.3.5) NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 4, id = 7606, type = NdbfsThread Adding 8152Mb to ZONE_LO (1280,260863) Adding 1073Mb to ZONE_LO (262145,34311) NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 5, id = 7607, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 6, id = 7608, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 7, id = 7609, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 8, id = 7610, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 9, id = 7611, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 10, id = 7612, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 11, id = 7613, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 12, id = 7614, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 13, id = 7615, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 14, id = 7616, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 15, id = 7617, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 16, id = 7618, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 17, id = 7619, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 18, id = 7620, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 19, id = 7621, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 20, id = 7622, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 21, id = 7623, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 22, id = 7624, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 23, id = 7625, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 24, id = 7626, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 25, id = 7627, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 26, id = 7628, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 27, id = 7629, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 28, id = 7630, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 29, id = 7631, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 30, id = 7632, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 31, id = 7633, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 32, id = 7634, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 33, id = 7635, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 34, id = 7636, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 35, id = 7637, type = NdbfsThread NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer Started thread, index = 36, id = 7638, type = NdbfsThread Started thread, index = 37, id = 7639, type = NdbfsThread Started thread, index = 38, id = 7640, type = NdbfsThread WOPool::init(61, 9) RWPool::init(22, 14) RWPool::init(42, 75) LCPFragWatchdog : WarnElapsed : 20000(ms) MaxElapsed 60000(ms) : period millis : 10000 blockSz: 800, wpp: 8188 -> 624 (76) RWPool::init(29, 624) RWPool::init(42, 16) RWPool::init(62, 12) RWPool::init(42, 83) 2014-10-28 16:30:55 [ndbd] WARNING -- Ndb kernel thread 0 is stuck in: Job Handling elapsed=100 2014-10-28 16:30:55 [ndbd] INFO -- Watchdog: User time: 32 System time: 315 RWPool::init(c2, 18) RWPool::init(e2, 16) WOPool::init(41, 8) RWPool::init(82, 12) RWPool::init(a2, 54) WOPool::init(21, 10) blockSz: 1024, wpp: 8188 -> 816 (28) RWPool::init(68, 816) 2014-10-28 16:30:55 [ndbd] INFO -- Start phase 0 completed 2014-10-28 16:30:55 [ndbd] INFO -- Using locked memory 2014-10-28 16:30:55 [ndbd] INFO -- findNeighbours from: 2191 old (left: 65535 right: 65535) new (4 4) Dbspj::execSTTOR() inst:0 phase=1 2014-10-28 16:30:55 [ndbd] INFO -- Start phase 1 completed 2014-10-28 16:30:55 [ndbd] INFO -- Start phase 2 completed 2014-10-28 16:30:56 [ndbd] INFO -- Start phase 3 completed Dbspj::execSTTOR() inst:0 phase=4 2014-10-28 16:30:56 [ndbd] INFO -- Start phase 4 completed 2014-10-28 16:30:56 [ndbd] INFO -- Receive arbitrator node 1 [ticket=0bdc00245b30cd91] Found pending trans (0) - aborting abort create 501 abort create 502 .... abort create 1032 restartCreateObj(1) file: 1 restartCreateObj(20) file: 1 ... restartCreateObj(1065) file: 1 restartDropObj(219) 2014-10-28 16:31:13 [ndbd] INFO -- Received signal 11. Running error handler. 2014-10-28 16:31:13 [ndbd] INFO -- Signal 11 received; Segmentation fault 2014-10-28 16:31:13 [ndbd] INFO -- /export/home/pb2/build/sb_0-11918329-1396368634.9/mysql-cluster-gpl-7.3.5/storage/ndb/src/kernel/ndbd.cpp 2014-10-28 16:31:13 [ndbd] INFO -- Error handler signal shutting down system 2014-10-28 16:31:13 [ndbd] INFO -- Error handler shutdown completed - exiting 2014-10-28 16:31:14 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 11. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
[3 Nov 2014 6:57]
ray liu
ndb_error_report
Attachment: ndb_error_report_20141103145231.tar.bz2 (application/octet-stream, text), 886.02 KiB.
[7 Jul 2015 12:48]
Mikael Ronström
Crash is in Dbdict.cpp in a restart situation.
[3 Nov 2015 17:51]
MySQL Verification Team
Hi Ray, Crash is in Dbdict.cpp in a restart situation, but I cannot reproduce this unless I corrupt (manually). Are you sure you don't have any issues with storage subsystem? Have you reproduced this problem after it first happened (the solution to the problem is to start failing datanode with --initial)? Best regards Bogdan Kecman