Bug #11384 drop database causes mysqld to core
Submitted: 16 Jun 2005 14:11 Modified: 22 Jul 2005 12:32
Reporter: Tomas Ulin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.1-wl2325 OS:Linux (linux)
Assigned to: Stewart Smith CPU Architecture:Any

[16 Jun 2005 14:11] Tomas Ulin
Description:
/home/tomas/mysql-5.1-wl2325/client/.libs/lt-mysqltest: At line 27: query 'DROP DATABASE BANK' failed: 2013: Lost connection to MySQL server during query

core says:

(gdb) where
#0  0x401d8cb1 in kill () from /lib/libc.so.6
#1  0x4004c639 in pthread_kill () from /lib/libpthread.so.0
#2  0x0831674e in write_core (sig=11) at stacktrace.c:220
#3  0x081b9b4f in handle_segfault (sig=11) at mysqld.cc:2005
#4  0x4004ed69 in __pthread_clock_settime () from /lib/libpthread.so.0
#5  <signal handler called>
#6  0x08204833 in remove_table_from_cache(THD*, char const*, char const*, bool) (thd=0x8771048, db=0x87803a4 "BANK", 
    table_name=0x87803a9 "SYSTEM_VALUES", return_if_owned_by_thd=false) at sql_base.cc:4078
#7  0x081b43e3 in lock_table_name(THD*, st_table_list*) (thd=0x8771048, table_list=0x8780250) at lock.cc:583
#8  0x081b4610 in lock_table_names(THD*, st_table_list*) (thd=0x8771048, table_list=0x8780250) at lock.cc:658
#9  0x082cc1c9 in mysql_rm_table_part2(THD*, st_table_list*, bool, bool, bool, bool) (thd=0x8771048, tables=0x8780250, if_exists=true, 
    drop_temporary=false, drop_view=true, dont_log_query=true) at sql_table.cc:219
#10 0x082cc0a2 in mysql_rm_table_part2_with_lock(THD*, st_table_list*, bool, bool, bool) (thd=0x8771048, tables=0x8780250, if_exists=true, 
    drop_temporary=false, dont_log_query=true) at sql_table.cc:165
#11 0x082cafab in mysql_rm_known_files (thd=0x8771048, dirp=0x88045d8, db=0x8780248 "BANK", org_path=0x404f5104 "./BANK/", level=0) at sql_db.cc:826
#12 0x082ca624 in mysql_rm_db(THD*, char*, bool, bool) (thd=0x8771048, db=0x8780248 "BANK", if_exists=false, silent=false) at sql_db.cc:633
#13 0x081d5bdf in mysql_execute_command(THD*) (thd=0x8771048) at sql_parse.cc:3600
#14 0x081db039 in mysql_parse(THD*, char*, unsigned) (thd=0x8771048, inBuf=0x8780208 "DROP DATABASE BANK", length=18) at sql_parse.cc:5377
#15 0x081d093e in dispatch_command(enum_server_command, THD*, char*, unsigned) (command=COM_QUERY, thd=0x8771048, 
    packet=0x877c1d1 "DROP DATABASE BANK", packet_length=19) at sql_parse.cc:1683
#16 0x081d0156 in do_command(THD*) (thd=0x8771048) at sql_parse.cc:1486
#17 0x081cf272 in handle_one_connection (arg=0x8771048) at sql_parse.cc:1135
#18 0x40049dc7 in pthread_detach () from /lib/libpthread.so.0
#19 0x40280aaa in clone () from /lib/libc.so.6

And the problem is uninitialized data in the in_use variable of the table struct:

#6  0x08204833 in remove_table_from_cache(THD*, char const*, char const*, bool) (thd=0x8771048, db=0x87803a4 "BANK", 
    table_name=0x87803a9 "SYSTEM_VALUES", return_if_owned_by_thd=false) at sql_base.cc:4078
4078            if (thd_table->db_stat)                 // If table is open
(gdb) l
4073          */
4074          for (TABLE *thd_table= in_use->open_tables;
4075               thd_table ;
4076               thd_table= thd_table->next)
4077          {
4078            if (thd_table->db_stat)                 // If table is open
4079              mysql_lock_abort_for_thread(thd, thd_table);
4080          }
4081        }
4082        else
(gdb) p in_use->open_tables
$1 = (TABLE *) 0x8f8f8f8f

How to repeat:
You have to work in a source tree and with ndb/test compiled

killall -9 mysqld ndbd ndb_mgmd; ./mysql-test-run --fast --ndb-extra-test --do-test=rpl_ndb_bank --start-and-exit
runtest < t/rpl_ndb_bank.test
runtest < t/rpl_ndb_bank.test

happens on initialization on the second run

$ alias runtest
alias runtest='MYSQL_DUMP='\''../client/mysqldump --no-defaults -uroot --socket=var/tmp/master.sock'\'' MYSQL_DUMP_SLAVE='\''../client/mysqldump --no-defaults -uroot --socket=var/tmp/slave.sock'\'' NDB_TOOLS_DIR=../storage/ndb/tools NDB_TOOLS_OUTPUT=`pwd`/var/log/ndb_tools.log NDB_BACKUP_DIR=`pwd`/var/ndbcluster-9350 NDBCLUSTER_PORT=9350 NDBCLUSTER_PORT_SLAVE=9358 MASTER_MYPORT=9306 MASTER_MYPORT1=9307 SLAVE_MYPORT=9308 NDB_EXTRA_TEST=1 NDB_STATUS_OK=1 NDB_MGM=../storage/ndb/src/mgmclient/ndb_mgm ../client/mysqltest -D test -u root --socket=var/tmp/master.sock'

NOTE

there are some changes you can do to make the testcase run faster but still get it...
[16 Jun 2005 19:44] Tomas Ulin
I also get it if I run mysql-test-run with:

--skip-slave-binlog 

simplifies the debug printout on the slave...
[21 Jun 2005 2:33] Stewart Smith
Verified with 5.1-wl2325 bk tree. Slave mysqld is what crashes.
[22 Jun 2005 5:34] Stewart Smith
There is a bug in SUMA that crashes ndbd (and, seemingly, mysqld) when we're trying to unsubscribe from an already removed event.

I have a patch that prevents the ndbd (and hence cluster) crash as well as mysqld from crashing.

The bank test still doesn't seem to go too well, so there are probably other bugs (there are also still valgrind warnings).

Since I am no expert on suma, i'm wanting to discuss the patch before checking anything in.
[27 Jun 2005 7:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/26431
[27 Jun 2005 12:44] Stewart Smith
Tomas believe that ignoring the error is ignoring the symptom of a larger problem, and that if we do get this signal, it is because something is seriously wrong somewhere else (quite possibly corrupted in bad ways) and we really should not continue.

Currently looking for the api source of the problem.
[6 Jul 2005 2:26] Stewart Smith
Can no longer repeat with latest BK plus
bk commit - 5.1 tree (stewart:1.1984) WL#2325

which seemed to fix things up on ppc.
[14 Jul 2005 7:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/27048
[22 Jul 2005 12:07] Stewart Smith
Pushed to 4.0.26, 5.0.11. Can only reproduce with cluster replication though (5.1).
[22 Jul 2005 12:32] Stewart Smith
Clarification: only the second patch was pushed. First one deemed inadequate.