Bug #20787 ndb_restore produces core dump during restore operation
Submitted: 29 Jun 2006 22:41 Modified: 18 Jul 2006 7:47
Reporter: Nikolay Grishakin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1 OS:Linux (Linux)
Assigned to: Pekka Nousiainen CPU Architecture:Any

[29 Jun 2006 22:41] Nikolay Grishakin
Description:
ndb_restore produces core dump during restore operation. If "--exec $NDB_MGM localhost:$NDBCLUSTER_PORT --execute="ALL STATUS"" removed from test case no core generated.

Core was generated by `/home/ndbdev/ngrishakin/mysql-5.1/storage/ndb/tools/ndb_restore --no-default
 -'.
Program terminated with signal 6, Aborted.
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /usr/lib64/libz.so.1...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
#0  0x00000033b832f280 in raise () from /lib64/libc.so.6
(gdb) dt
Undefined command: "dt".  Try "help".
(gdb) bt
#0  0x00000033b832f280 in raise () from /lib64/libc.so.6
#1  0x00000033b8330750 in abort () from /lib64/libc.so.6
#2  0x0000000000454135 in exitHandler (code=1) at restore/restore_main.cpp:439
#3  0x00000000004546dd in main (argc=1, argv=0x79d6c0) at restore/restore_main.cpp:551
(gdb)

How to repeat:
-- source include/have_ndb.inc
--disable_warnings
DROP DATABASE IF EXISTS util;
DROP TABLE IF EXISTS util.t1;
--enable_warnings
CREATE DATABASE util;

CREATE LOGFILE GROUP lg
ADD UNDOFILE 'undofile.dat'
INITIAL_SIZE 16M
UNDO_BUFFER_SIZE = 1M
ENGINE=NDB;

CREATE TABLESPACE ts
ADD DATAFILE 'datafile.dat'
USE LOGFILE GROUP lg
INITIAL_SIZE 12M
ENGINE NDB;

CREATE TABLE util.t1
(a1 INT NOT NULL PRIMARY KEY, a2 BLOB) TABLESPACE ts STORAGE DISK ENGINE=NDB;

set @blb = repeat('a', 300);

let $j= 500;
--disable_query_log
while ($j)
{
  eval INSERT INTO util.t1 VALUES ($j, @blb);
  dec $j;
}
--enable_query_log
SELECT COUNT(*) FROM util.t1;
SELECT a2 from util.t1 where a1=1;

--echo ndb_mgm ALL STATUS
--exec $NDB_MGM localhost:$NDBCLUSTER_PORT --execute="ALL STATUS"

-- source include/ndb_backup.inc
DROP TABLE util.t1;

ALTER TABLESPACE ts
DROP DATAFILE 'datafile.dat'
ENGINE = NDB;

DROP TABLESPACE ts
ENGINE = NDB;
-- source include/ndb_restore_master.inc

SELECT COUNT(*) FROM util.t1;
[13 Jul 2006 12:48] Pekka Nousiainen
seems to be some very bad memory corruption in ndb_restore
[15 Jul 2006 15:44] Pekka Nousiainen
Several confusions here

- in 5.1 if ndb_restore fails, it produces a core
- gdb must be applied to lt-ndb_restore, not ndb_restore (my error)
- ndb_restore fails because LOGFILE GROUP lg exists
- ndb_restore prints this error as "info" (not seen, will fix)

Fix to test case is, add DROP before the restore command

+   DROP LOGFILE GROUP lg ENGINE=NDB;
    -- source include/ndb_restore_master.inc

Setting this to "Not a Bug".
[17 Jul 2006 21:18] Jonathan Miller
We need to address the issues causing the core in this test case. We can the alter the test case, but alter the test case just to advoid a core is not the correct solution here.
/jeb
[18 Jul 2006 7:47] Pekka Nousiainen
clarify 2 remarks and close this.

> in 5.1 if ndb_restore fails, it produces a core

in 5.1, ndb programs have option --core-file [= true/false ]
which means to dump core ie. abort() on any error.
default value of the option is true iff source is debug
compiled (VM_TRACE defined).

> ndb_restore prints this error as "info" 

"info" is printed to stdout, or may not be printed at all
unless there is some "verbose" option (havent checked ndb_restore).
changed 4 such printouts to use "err" which is always printed
and goes to stderr, so it will be seen in the test log.