Bug #18295 | DD: DROP LOGFILE GROUP causes ndbd core when compiled with GCC4 | ||
---|---|---|---|
Submitted: | 17 Mar 2006 8:25 | Modified: | 27 Mar 2006 12:07 |
Reporter: | Jonathan Miller | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | 5.1.8 | OS: | Linux (Linux 32 Bit OS) |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[17 Mar 2006 8:25]
Jonathan Miller
[17 Mar 2006 12:34]
Jonas Oreland
Hi jeb, I can't repeat it...tried on 1,2 & 4 node cluster (using attached file) With/without mysql-test-run. (though only on 64-bit...) Do you get it on empty database or do you need to run something else first? (you're not present on IRC currently, so I'm reassigning to you, so we don't forget it...)
[23 Mar 2006 16:09]
Jonathan Miller
In trying to fix timing issue in rpl_ndb_dd_advance.test today, I start hitting the same bug, but with a twist. I would get the can not drop LG on the master, but the slave is the one to produce the core. This is being done on NDB08 which is a 32 bit system: Linux ndb08.mysql.com 2.6.12-1.1447_FC4smp #1 SMP Fri Aug 26 20:57:13 EDT 2005 i686 athlon i386 GNU/Linux GCC: Using built-in specs. Target: i386-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,java,f95,ada --enable-java-awt=gtk --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --host=i386-redhat-linux Thread model: posix gcc version 4.0.1 20050727 (Red Hat 4.0.1-5) Master Error: rpl_ndb_dd_advance [ fail ] Errors are (from /space/alcatel-drop/mysql-test/var/log/mysqltest-time) : mysqltest: At line 414: query 'DROP LOGFILE GROUP lg1 ENGINE=NDB' failed: 1508: Failed to drop LOGFILE GROUP Master valgrind: ==31472== 64 bytes in 2 blocks are definitely lost in loss record 37 of 71 ==31472== at 0x1B909222: malloc (vg_replace_malloc.c:130) ==31472== by 0x856A2AD: EventBufData_list::add_gci_op(EventBufData_list::Gci_op, bool) (NdbEventOperationImpl.cpp:2214) ==31472== by 0x856DD0C: NdbEventBuffer::insertDataL(NdbEventOperationImpl*, SubTableData const*, LinearSectionPtr*) (NdbEventOperationImpl.hpp:206) ==31472== by 0x856E001: NdbEventBuffer::report_node_failure(unsigned) (NdbEventOperationImpl.cpp:1447) ==31472== by 0x8571FA2: Ndb::report_node_failure_completed(unsigned) (Ndbif.cpp:260) ==31472== by 0x8572091: Ndb::statusMessage(void*, unsigned, bool, bool) (Ndbif.cpp:225) ==31472== by 0x8578C76: TransporterFacade::ReportNodeFailureComplete(unsigned short) (TransporterFacade.cpp:821) ==31472== by 0x857A8F2: ClusterMgr::execNF_COMPLETEREP(unsigned const*) (ClusterMgr.cpp:379) ==31472== by 0x857A9AD: ClusterMgr::reportNodeFailed(unsigned short) (ClusterMgr.cpp:452) ==31472== by 0x857B930: ClusterMgr::reportDisconnected(unsigned short) (ClusterMgr.cpp:422) ==31472== by 0x8577ADA: TransporterFacade::reportDisconnected(int) (TransporterFacade.cpp:1207) ==31472== by 0x8577B14: reportDisconnect(void*, unsigned short, unsigned) (TransporterFacade.cpp:127) ==31472== by 0x857C848: TransporterRegistry::report_disconnect(unsigned short, int) (TransporterRegistry.cpp:1229) ==31472== by 0x85A4F1A: TCP_Transporter::doReceive() (Transporter.hpp:163) ==31472== by 0x857DC93: TransporterRegistry::performReceive() (TransporterRegistry.cpp:949) ==31472== by 0x8577219: TransporterFacade::threadMainReceive() (TransporterFacade.cpp:514) No error log, core or trace files on master. Slave valgrind: ==31587== 16 bytes in 2 blocks are definitely lost in loss record 12 of 74 ==31587== at 0x1B909222: malloc (vg_replace_malloc.c:130) ==31587== by 0x856A2AD: EventBufData_list::add_gci_op(EventBufData_list::Gci_op, bool) (NdbEventOperationImpl.cpp:2214) ==31587== by 0x856DD0C: NdbEventBuffer::insertDataL(NdbEventOperationImpl*, SubTableData const*, LinearSectionPtr*) (NdbEventOperationImpl.hpp:206) ==31587== by 0x856E001: NdbEventBuffer::report_node_failure(unsigned) (NdbEventOperationImpl.cpp:1447) ==31587== by 0x8571FA2: Ndb::report_node_failure_completed(unsigned) (Ndbif.cpp:260) ==31587== by 0x8572091: Ndb::statusMessage(void*, unsigned, bool, bool) (Ndbif.cpp:225) ==31587== by 0x8578C76: TransporterFacade::ReportNodeFailureComplete(unsigned short) (TransporterFacade.cpp:821) ==31587== by 0x857A8F2: ClusterMgr::execNF_COMPLETEREP(unsigned const*) (ClusterMgr.cpp:379) ==31587== by 0x857A9AD: ClusterMgr::reportNodeFailed(unsigned short) (ClusterMgr.cpp:452) ==31587== by 0x857B930: ClusterMgr::reportDisconnected(unsigned short) (ClusterMgr.cpp:422) ==31587== by 0x8577ADA: TransporterFacade::reportDisconnected(int) (TransporterFacade.cpp:1207) ==31587== by 0x8577B14: reportDisconnect(void*, unsigned short, unsigned) (TransporterFacade.cpp:127) ==31587== by 0x857C848: TransporterRegistry::report_disconnect(unsigned short, int) (TransporterRegistry.cpp:1229) ==31587== by 0x85A4F1A: TCP_Transporter::doReceive() (Transporter.hpp:163) ==31587== by 0x857DC93: TransporterRegistry::performReceive() (TransporterRegistry.cpp:949) ==31587== by 0x85772D1: TransporterFacade::external_poll(unsigned) (TransporterFacade.cpp:536) Slave Back Trace: #0 0x00e9e402 in __kernel_vsyscall () #1 0x008db118 in raise () from /lib/libc.so.6 #2 0x008dc888 in abort () from /lib/libc.so.6 #3 0x0809fbe0 in childAbort (code=0, currentStartPhase=0) at main.cpp:105 #4 0x0827b358 in NdbShutdown (type=NST_ErrorHandlerSignal, restartType=NRT_Default) at Emulator.cpp:254 #5 0x0809f176 in handler_error (signum=11) at main.cpp:636 #6 <signal handler called> #7 0x082837b5 in RWPool::release (this=0x9c5a7b8, ptr={p = 0x20, i = 286737}) at RWPool.cpp:121 #8 0x080f928e in Dbdict::drop_fg_commit_start (this=0x9c43dc8, signal=0x1, op=0x3f6d870) at ../../../../../storage/ndb/src/kernel/vm/Pool.hpp:318 #9 0x080f15bf in Dbdict::execDICT_COMMIT_REQ (this=0x9c43dc8, signal=0x840ada4) at dbdict/Dbdict.cpp:14113 #10 0x082787a2 in FastScheduler::doJob (this=0x8407f20) at ./SimulatedBlock.hpp:575 #11 0x08279749 in ThreadConfig::ipControlLoop (this=0x9b6e820) at ThreadConfig.cpp:175 #12 0x0809f484 in main (argc=1, argv=0xbf8ac694) at main.cpp:470 Slave Error: Time: Thursday 23 Mars 2006 - 18:47:01 Status: Temporary error, restart node Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug) Error: 6000 Error data: Signal 11 received; Segmentation fault Error object: main.cpp Program: /space/alcatel-drop/storage/ndb/src/kernel/ndbd Pid: 31550 Trace: ./ndb_1_trace.log.1 Version: Version 5.1.8 (beta) ***EOM*** Slave Trace: --------------- Signal ---------------- r.bn: 250 "DBDICT", r.proc: 1, r.sigId: 2448536 gsn: 664 "Unknown" prio: 1 s.bn: 250 "DBDICT", s.proc: 1, s.sigId: 2448535 length: 3 trace: 0 #sec: 0 fragInf: 0 H'0000019a H'00fa0001 H'0000019b --------------- Signal ---------------- r.bn: 250 "DBDICT", r.proc: 1, r.sigId: 2448535 gsn: 730 "DROP_OBJ_CONF" prio: 1 s.bn: 250 "DBDICT", s.proc: 1, s.sigId: 2448532 length: 3 trace: 0 #sec: 0 fragInf: 0 H'00fa0001 H'0000019a H'00000000 --------------- Signal ---------------- r.bn: 245 "DBTC", r.proc: 1, r.sigId: 2448534 gsn: 409 "TIME_SIGNAL" prio: 1 s.bn: 252 "QMGR", s.proc: 1, s.sigId: 2448531 length: 1 trace: 0 #sec: 0 fragInf: 0 H'00000004 I will copy files off with other files on ndbmaster.
[23 Mar 2006 17:00]
Jonathan Miller
ndb_dd_ddl.test valgrind ==32590== ==32590== ERROR SUMMARY: 650 errors from 35 contexts (suppressed: 21 from 1) ==32590== malloc/free: in use at exit: 280595 bytes in 465 blocks. ==32590== malloc/free: 47423 allocs, 46958 frees, 49772305 bytes allocated. ==32590== For counts of detected errors, rerun with: -v ==32590== searching for pointers to 465 not-freed blocks. ==32590== checked 76295640 bytes. ==32590== ==32590== ==32590== 16 bytes in 2 blocks are definitely lost in loss record 10 of 68 ==32590== at 0x1B909222: malloc (vg_replace_malloc.c:130) ==32590== by 0x856A2AD: EventBufData_list::add_gci_op(EventBufData_list::Gci_op, bool) (NdbEventOperationImpl.cpp:2214) ==32590== by 0x856DD0C: NdbEventBuffer::insertDataL(NdbEventOperationImpl*, SubTableData const*, LinearSectionPtr*) (NdbEventOperationImpl.hpp:206) ==32590== by 0x856E001: NdbEventBuffer::report_node_failure(unsigned) (NdbEventOperationImpl.cpp:1447) ==32590== by 0x8571FA2: Ndb::report_node_failure_completed(unsigned) (Ndbif.cpp:260) ==32590== by 0x8572091: Ndb::statusMessage(void*, unsigned, bool, bool) (Ndbif.cpp:225) ==32590== by 0x8578C76: TransporterFacade::ReportNodeFailureComplete(unsigned short) (TransporterFacade.cpp:821)
[24 Mar 2006 7:17]
Jonas Oreland
FYI: reproduced on ndb04
[24 Mar 2006 8:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/4102
[24 Mar 2006 10:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/4110
[24 Mar 2006 12:30]
Jonas Oreland
pushed into 5.1.8 (this was a compiler bug...don't know if it should be documented)
[27 Mar 2006 12:07]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bugfix, yourself. More information about accessing the source trees is available at http://www.mysql.com/doc/en/Installing_source_tree.html Additional info: Documented bugfix in 5.1.8 changelog. Closed.
[27 Mar 2006 12:19]
Jon Stephens
Updated synopsis.