Bug #18295 DD: DROP LOGFILE GROUP causes ndbd core when compiled with GCC4
Submitted: 17 Mar 2006 8:25 Modified: 27 Mar 2006 12:07
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1.8 OS:Linux (Linux 32 Bit OS)
Assigned to: Jonas Oreland CPU Architecture:Any

[17 Mar 2006 8:25] Jonathan Miller
Description:
ndb_dd_ddl.test produced this for nikolay on ndb15 and for me on ndb08.

#0  0x00a7c402 in __kernel_vsyscall ()
#1  0x008db118 in raise () from /lib/libc.so.6
#2  0x008dc888 in abort () from /lib/libc.so.6
#3  0x0809fbc0 in childAbort (code=0, currentStartPhase=0) at main.cpp:105
#4  0x0827b098 in NdbShutdown (type=NST_ErrorHandlerSignal,
    restartType=NRT_Default) at Emulator.cpp:254
#5  0x0809f156 in handler_error (signum=11) at main.cpp:636
#6  <signal handler called>
#7  0x082834f5 in RWPool::release (this=0x8b83450, ptr={p = 0x20, i = 286737})
    at RWPool.cpp:121
#8  0x080f926e in Dbdict::drop_fg_commit_start (this=0x8b6ca60, signal=0x1,
    op=0x5554870) at ../../../../../storage/ndb/src/kernel/vm/Pool.hpp:318
#9  0x080f159f in Dbdict::execDICT_COMMIT_REQ (this=0x8b6ca60,
    signal=0x840aa64) at dbdict/Dbdict.cpp:14113
#10 0x082784e2 in FastScheduler::doJob (this=0x8407be0)
    at ./SimulatedBlock.hpp:575
#11 0x082793cf in ThreadConfig::ipControlLoop (this=0x8a82820)
    at ThreadConfig.cpp:175
#12 0x0809f464 in main (argc=1, argv=0xbf998504) at main.cpp:470

How to repeat:
mysql> use test
Database changed
mysql> DROP TABLE IF EXISTS t1;
Query OK, 0 rows affected, 1 warning (0.02 sec)

mysql> DROP DATABASE IF EXISTS mysqltest;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> CREATE DATABASE mysqltest;
Query OK, 1 row affected (0.13 sec)

mysql> CREATE LOGFILE GROUP lg1
    -> ADD UNDOFILE 'undofile.dat'
    -> INITIAL_SIZE 16M
    -> UNDO_BUFFER_SIZE = 1M
    -> ENGINE=NDB;
Query OK, 0 rows affected (1.36 sec)

mysql> CREATE LOGFILE GROUP lg2
    -> ADD UNDOFILE 'undofile2.dat'
    -> INITIAL_SIZE 16M
    -> UNDO_BUFFER_SIZE 1M
    -> ENGINE NDB;
ERROR 1507 (HY000): Failed to create LOGFILE GROUP
mysql> SHOW WARNINGS;
+-------+------+---------------------------------------------------------------------------+
| Level | Code | Message            |
+-------+------+---------------------------------------------------------------------------+
| Error | 1296 | Got error 1514 'Currently there is a limit of one logfile group' from NDB |
| Error | 1507 | Failed to create LOGFILE GROUP            |
+-------+------+---------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> CREATE LOGFILE GROUP lg1
    -> ADD UNDOFILE 'undofile.dat'
    -> INITIAL_SIZE 16M
    -> UNDO_BUFFER_SIZE = 1M
    -> ENGINE=NDB;
ERROR 1507 (HY000): Failed to create LOGFILE GROUP
mysql>
mysql> ALTER LOGFILE GROUP lg1
    -> ADD UNDOFILE 'undofile02.dat'
    -> INITIAL_SIZE 4M ENGINE NDB;
Query OK, 0 rows affected (0.59 sec)

mysql> CREATE TABLESPACE ts1
    -> ADD DATAFILE 'datafile.dat'
    -> USE LOGFILE GROUP lg1
    -> INITIAL_SIZE 12M
    -> ENGINE NDB;
Query OK, 0 rows affected (1.46 sec)

mysql> ALTER TABLESPACE ts1
    -> ADD DATAFILE 'datafile2.dat'
    -> INITIAL_SIZE 12M
    -> ENGINE=NDB;
Query OK, 0 rows affected (1.15 sec)

mysql> ALTER TABLESPACE ts1
    -> ADD DATAFILE 'datafile2.dat'
    -> INITIAL_SIZE 12M
    -> ENGINE=NDB;
ERROR 1512 (HY000): Failed to alter:  CREATE DATAFILE
mysql> CREATE TABLE mysqltest.t1
    -> (pk1 INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL)
    -> TABLESPACE ts1 STORAGE DISK
    -> ENGINE=NDB;
Query OK, 0 rows affected (0.87 sec)

mysql> CREATE TABLE mysqltest.t1
    -> (pk1 INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL)
    -> TABLESPACE ts1 STORAGE DISK
    -> ENGINE=NDB;
ERROR 1050 (42S01): Table 't1' already exists
mysql> CREATE TABLE mysqltest.t1
    -> (pk1 INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL)
    -> ENGINE=NDB;
ERROR 1050 (42S01): Table 't1' already exists
mysql> DROP TABLE mysqltest.t1;
Query OK, 0 rows affected (0.65 sec)

mysql>
mysql> ALTER TABLESPACE ts1
    -> DROP DATAFILE 'datafile2.dat'
    -> ENGINE=NDB;
Query OK, 0 rows affected (0.22 sec)

mysql> ALTER TABLESPACE ts1
    -> DROP DATAFILE 'datafile2.dat'
    -> ENGINE=NDB;
ERROR 1512 (HY000): Failed to alter:  NO SUCH FILE
mysql>
mysql> ALTER TABLESPACE ts1
    -> DROP DATAFILE 'datafile.dat'
    -> ENGINE=NDB;
Query OK, 0 rows affected (0.20 sec)

mysql> ALTER TABLESPACE ts1
    -> DROP DATAFILE 'datafile.dat'
    -> ENGINE=NDB;
ERROR 1512 (HY000): Failed to alter:  NO SUCH FILE
mysql>
mysql> DROP TABLESPACE ts1
    -> ENGINE=NDB;
Query OK, 0 rows affected (0.18 sec)

mysql> DROP TABLESPACE ts1
    -> ENGINE=NDB;
ERROR 1508 (HY000): Failed to drop TABLESPACE
mysql>
mysql> DROP LOGFILE GROUP lg1
    -> ENGINE=NDB;
ERROR 1508 (HY000): Failed to drop LOGFILE GROUP
mysql> SHOW WARNINGS;
+-------+------+-------------------------------------------+
| Level | Code | Message                                   |
+-------+------+-------------------------------------------+
| Error | 1296 | Got error 4009 'Cluster Failure' from NDB |
| Error | 1508 | Failed to drop LOGFILE GROUP              |
+-------+------+-------------------------------------------+
2 rows in set (0.00 sec)
[17 Mar 2006 12:34] Jonas Oreland
Hi jeb,

I can't repeat it...tried on 1,2 & 4 node cluster (using attached file)
With/without mysql-test-run.
(though only on 64-bit...)

Do you get it on empty database or do you need to run something else first?

(you're not present on IRC currently, so I'm reassigning to you, so we don't forget it...)
[23 Mar 2006 16:09] Jonathan Miller
In trying to fix timing issue in rpl_ndb_dd_advance.test today, I start hitting the same bug, but with a twist. I would get the can not drop LG on the master, but the slave is the one to produce the core. This is being done on NDB08 which is a 32 bit system:
Linux ndb08.mysql.com 2.6.12-1.1447_FC4smp #1 SMP Fri Aug 26 20:57:13 EDT 2005 i686 athlon i386 GNU/Linux

GCC:
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,java,f95,ada --enable-java-awt=gtk --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --host=i386-redhat-linux
Thread model: posix
gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)

Master Error:
rpl_ndb_dd_advance             [ fail ]

Errors are (from /space/alcatel-drop/mysql-test/var/log/mysqltest-time) :
mysqltest: At line 414: query 'DROP LOGFILE GROUP lg1 ENGINE=NDB' failed: 1508: Failed to drop LOGFILE GROUP

Master valgrind:
==31472== 64 bytes in 2 blocks are definitely lost in loss record 37 of 71
==31472==    at 0x1B909222: malloc (vg_replace_malloc.c:130)
==31472==    by 0x856A2AD: EventBufData_list::add_gci_op(EventBufData_list::Gci_op, bool) (NdbEventOperationImpl.cpp:2214)
==31472==    by 0x856DD0C: NdbEventBuffer::insertDataL(NdbEventOperationImpl*, SubTableData const*, LinearSectionPtr*) (NdbEventOperationImpl.hpp:206)
==31472==    by 0x856E001: NdbEventBuffer::report_node_failure(unsigned) (NdbEventOperationImpl.cpp:1447)
==31472==    by 0x8571FA2: Ndb::report_node_failure_completed(unsigned) (Ndbif.cpp:260)
==31472==    by 0x8572091: Ndb::statusMessage(void*, unsigned, bool, bool) (Ndbif.cpp:225)
==31472==    by 0x8578C76: TransporterFacade::ReportNodeFailureComplete(unsigned short) (TransporterFacade.cpp:821)
==31472==    by 0x857A8F2: ClusterMgr::execNF_COMPLETEREP(unsigned const*) (ClusterMgr.cpp:379)
==31472==    by 0x857A9AD: ClusterMgr::reportNodeFailed(unsigned short) (ClusterMgr.cpp:452)
==31472==    by 0x857B930: ClusterMgr::reportDisconnected(unsigned short) (ClusterMgr.cpp:422)
==31472==    by 0x8577ADA: TransporterFacade::reportDisconnected(int) (TransporterFacade.cpp:1207)
==31472==    by 0x8577B14: reportDisconnect(void*, unsigned short, unsigned) (TransporterFacade.cpp:127)
==31472==    by 0x857C848: TransporterRegistry::report_disconnect(unsigned short, int) (TransporterRegistry.cpp:1229)
==31472==    by 0x85A4F1A: TCP_Transporter::doReceive() (Transporter.hpp:163)
==31472==    by 0x857DC93: TransporterRegistry::performReceive() (TransporterRegistry.cpp:949)
==31472==    by 0x8577219: TransporterFacade::threadMainReceive() (TransporterFacade.cpp:514)

No error log, core or trace files on master.

Slave valgrind:
==31587== 16 bytes in 2 blocks are definitely lost in loss record 12 of 74
==31587==    at 0x1B909222: malloc (vg_replace_malloc.c:130)
==31587==    by 0x856A2AD: EventBufData_list::add_gci_op(EventBufData_list::Gci_op, bool) (NdbEventOperationImpl.cpp:2214)
==31587==    by 0x856DD0C: NdbEventBuffer::insertDataL(NdbEventOperationImpl*, SubTableData const*, LinearSectionPtr*) (NdbEventOperationImpl.hpp:206)
==31587==    by 0x856E001: NdbEventBuffer::report_node_failure(unsigned) (NdbEventOperationImpl.cpp:1447)
==31587==    by 0x8571FA2: Ndb::report_node_failure_completed(unsigned) (Ndbif.cpp:260)
==31587==    by 0x8572091: Ndb::statusMessage(void*, unsigned, bool, bool) (Ndbif.cpp:225)
==31587==    by 0x8578C76: TransporterFacade::ReportNodeFailureComplete(unsigned short) (TransporterFacade.cpp:821)
==31587==    by 0x857A8F2: ClusterMgr::execNF_COMPLETEREP(unsigned const*) (ClusterMgr.cpp:379)
==31587==    by 0x857A9AD: ClusterMgr::reportNodeFailed(unsigned short) (ClusterMgr.cpp:452)
==31587==    by 0x857B930: ClusterMgr::reportDisconnected(unsigned short) (ClusterMgr.cpp:422)
==31587==    by 0x8577ADA: TransporterFacade::reportDisconnected(int) (TransporterFacade.cpp:1207)
==31587==    by 0x8577B14: reportDisconnect(void*, unsigned short, unsigned) (TransporterFacade.cpp:127)
==31587==    by 0x857C848: TransporterRegistry::report_disconnect(unsigned short, int) (TransporterRegistry.cpp:1229)
==31587==    by 0x85A4F1A: TCP_Transporter::doReceive() (Transporter.hpp:163)
==31587==    by 0x857DC93: TransporterRegistry::performReceive() (TransporterRegistry.cpp:949)
==31587==    by 0x85772D1: TransporterFacade::external_poll(unsigned) (TransporterFacade.cpp:536)

Slave Back Trace:
#0  0x00e9e402 in __kernel_vsyscall ()
#1  0x008db118 in raise () from /lib/libc.so.6
#2  0x008dc888 in abort () from /lib/libc.so.6
#3  0x0809fbe0 in childAbort (code=0, currentStartPhase=0) at main.cpp:105
#4  0x0827b358 in NdbShutdown (type=NST_ErrorHandlerSignal, restartType=NRT_Default)
    at Emulator.cpp:254
#5  0x0809f176 in handler_error (signum=11) at main.cpp:636
#6  <signal handler called>
#7  0x082837b5 in RWPool::release (this=0x9c5a7b8, ptr={p = 0x20, i = 286737}) at RWPool.cpp:121
#8  0x080f928e in Dbdict::drop_fg_commit_start (this=0x9c43dc8, signal=0x1, op=0x3f6d870)
    at ../../../../../storage/ndb/src/kernel/vm/Pool.hpp:318
#9  0x080f15bf in Dbdict::execDICT_COMMIT_REQ (this=0x9c43dc8, signal=0x840ada4)
    at dbdict/Dbdict.cpp:14113
#10 0x082787a2 in FastScheduler::doJob (this=0x8407f20) at ./SimulatedBlock.hpp:575
#11 0x08279749 in ThreadConfig::ipControlLoop (this=0x9b6e820) at ThreadConfig.cpp:175
#12 0x0809f484 in main (argc=1, argv=0xbf8ac694) at main.cpp:470

Slave Error:
Time: Thursday 23 Mars 2006 - 18:47:01
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 11 received; Segmentation fault
Error object: main.cpp
Program: /space/alcatel-drop/storage/ndb/src/kernel/ndbd
Pid: 31550
Trace: ./ndb_1_trace.log.1
Version: Version 5.1.8 (beta)
***EOM***

Slave Trace:
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 1, r.sigId: 2448536 gsn: 664 "Unknown" prio: 1
s.bn: 250 "DBDICT", s.proc: 1, s.sigId: 2448535 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'0000019a H'00fa0001 H'0000019b
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 1, r.sigId: 2448535 gsn: 730 "DROP_OBJ_CONF" prio: 1
s.bn: 250 "DBDICT", s.proc: 1, s.sigId: 2448532 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00fa0001 H'0000019a H'00000000
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 1, r.sigId: 2448534 gsn: 409 "TIME_SIGNAL" prio: 1
s.bn: 252 "QMGR", s.proc: 1, s.sigId: 2448531 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004

I will copy files off with other files on ndbmaster.
[23 Mar 2006 17:00] Jonathan Miller
ndb_dd_ddl.test valgrind
==32590==
==32590== ERROR SUMMARY: 650 errors from 35 contexts (suppressed: 21 from 1)
==32590== malloc/free: in use at exit: 280595 bytes in 465 blocks.
==32590== malloc/free: 47423 allocs, 46958 frees, 49772305 bytes allocated.
==32590== For counts of detected errors, rerun with: -v
==32590== searching for pointers to 465 not-freed blocks.
==32590== checked 76295640 bytes.
==32590==
==32590==
==32590== 16 bytes in 2 blocks are definitely lost in loss record 10 of 68
==32590==    at 0x1B909222: malloc (vg_replace_malloc.c:130)
==32590==    by 0x856A2AD: EventBufData_list::add_gci_op(EventBufData_list::Gci_op, bool) (NdbEventOperationImpl.cpp:2214)
==32590==    by 0x856DD0C: NdbEventBuffer::insertDataL(NdbEventOperationImpl*, SubTableData const*, LinearSectionPtr*) (NdbEventOperationImpl.hpp:206)
==32590==    by 0x856E001: NdbEventBuffer::report_node_failure(unsigned) (NdbEventOperationImpl.cpp:1447)
==32590==    by 0x8571FA2: Ndb::report_node_failure_completed(unsigned) (Ndbif.cpp:260)
==32590==    by 0x8572091: Ndb::statusMessage(void*, unsigned, bool, bool) (Ndbif.cpp:225)
==32590==    by 0x8578C76: TransporterFacade::ReportNodeFailureComplete(unsigned short) (TransporterFacade.cpp:821)
[24 Mar 2006 7:17] Jonas Oreland
FYI: reproduced on ndb04
[24 Mar 2006 8:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/4102
[24 Mar 2006 10:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/4110
[24 Mar 2006 12:30] Jonas Oreland
pushed into 5.1.8
(this was a compiler bug...don't know if it should be documented)
[27 Mar 2006 12:07] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented bugfix in 5.1.8 changelog. Closed.
[27 Mar 2006 12:19] Jon Stephens
Updated synopsis.