Bug #50396 Deadlock between DROP DATABASE (mysql_rm_table_part2) and SELECT
Submitted: 17 Jan 2010 13:30 Modified: 21 Jan 2010 15:42
Reporter: Philip Stoev Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Locking Severity:S2 (Serious)
Version:mysql-next-4284, 6.0-backup, backup-backport OS:Any
Assigned to: Jon Olav Hauglid CPU Architecture:Any
Triage: Triaged: D1 (Critical)

[17 Jan 2010 13:30] Philip Stoev
Description:
When executing a workload that runs non-concurrent BACKUP/RESTORE in 1 thread, and concurrent DML in the other 1-2 threads, mysqld deadlocked with the following 2 threads:

#1  0x0000000000a2da52 in safe_cond_timedwait (cond=0x1093260, mp=0x10931a0, abstime=0x7f8fc2b21a50, file=0xc13110 "../include/mysql/psi/mysql_thread.h",
    line=846) at thr_mutex.c:480
#2  0x00000000008aeb25 in inline_mysql_cond_timedwait (that=0x1093260, mutex=0x10931a0, abstime=0x7f8fc2b21a50) at ../include/mysql/psi/mysql_thread.h:846
#3  0x00000000008af893 in MDL_context::acquire_exclusive_locks (this=0x2eeb930, mdl_requests=0x7f8fc2b21b10) at mdl.cc:976
#4  0x0000000000669317 in lock_table_names (thd=0x2eeb858, table_list=0x2fa20c8) at lock.cc:976
#5  0x00000000007ff039 in mysql_rm_table_part2 (thd=0x2eeb858, tables=0x2fa20c8, if_exists=true, drop_temporary=false, drop_view=true, dont_log_query=true)
    at sql_table.cc:1907
#6  0x00000000007ec6fc in mysql_rm_known_files (thd=0x2eeb858, dirp=0x3053b98, db=0x2fa20c0 "test", org_path=0x7f8fc2b22410 "./test/", level=0,
    dropped_tables=0x7f8fc2b22740) at sql_db.cc:1219
#7  0x00000000007ecbe2 in mysql_rm_db (thd=0x2eeb858, db=0x2fa20c0 "test", if_exists=true, silent=false) at sql_db.cc:969
#8  0x000000000068a9ab in mysql_execute_command (thd=0x2eeb858) at sql_parse.cc:3759
#9  0x000000000073ebb7 in Execute_sql_statement::execute_server_code (this=0x7f8fc2b24130, thd=0x2eeb858) at sql_prepare.cc:2937
#10 0x000000000073e2ad in Prepared_statement::execute_server_runnable (this=0x7f8fc2b23cc0, server_runnable=0x7f8fc2b24130) at sql_prepare.cc:3441
#11 0x000000000073f18f in Ed_connection::execute_direct (this=0x7f8fc2b24190, server_runnable=0x7f8fc2b24130) at sql_prepare.cc:3970
#12 0x000000000073f29f in Ed_connection::execute_direct (this=0x7f8fc2b24190, sql_text={str = 0x3005858 "DROP DATABASE IF EXISTS `test`", length = 30})
    at sql_prepare.cc:3936
#13 0x00000000008a5812 in obs::Abstract_obj::drop (this=0x3006620, thd=0x2eeb858) at si_objects.cc:821
#14 0x00000000008a58cc in obs::Database_obj::drop (this=0x3006620, thd=0x2eeb858) at si_objects.cc:1877
#15 0x00000000008a5b97 in obs::Abstract_obj::create (this=0x3006620, thd=0x2eeb858) at si_objects.cc:761
#16 0x0000000000acf3f5 in bcat_create_item (catalogue=0x2f9d488, item=0x109c9e0, create_stmt=
      {begin = 0x30056e8 "63 CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */\n", end = 0x300572b ""}, other_meta_data=
      {begin = 0x0, end = 0x0}) at kernel.cc:2595
#17 0x0000000000ae29dc in read_and_create_items (s=0x3001c58, cat=0x2f9d488, kind=GLOBAL_ITEM) at stream_v1.c:1772
#18 0x0000000000ae245d in bstream_rd_meta_data (s=0x3001c58, cat=0x2f9d488) at stream_v1.c:1372
warning: (Internal error: pc 0xad7d07 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0xad7d06 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0xad7d06 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0xad7d06 in read in psymtab, but not in symtab.)

#19 0x0000000000ad7d07 in backup::read_meta_data (thd=warning: (Internal error: pc 0xad7d06 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0xad7cde in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0xad7d06 in read in psymtab, but not in symtab.)

0x2eeb858, info=warning: (Internal error: pc 0xad7d06 in read in psymtab, but not in symtab.)

@0x2f9d480, s=@0x3001c50) at stream.h:275
#20 0x0000000000ad1f4a in Backup_restore_ctx::do_restore (this=0x7f8fc2b25210, overwrite=true) at kernel.cc:1639
#21 0x0000000000ad4253 in execute_backup_command (thd=0x2eeb858, lex=0x2eed140, backupdir=0x7f8fc2b26190, overwrite=true, skip_gap_event=false)
    at kernel.cc:296
#22 0x0000000000686ee2 in mysql_execute_command (thd=0x2eeb858) at sql_parse.cc:2474
#23 0x000000000068eb2c in mysql_parse (thd=0x2eeb858, inBuf=0x2f114d0 "RESTORE FROM '/tmp/gentest13075.tmp' OVERWRITE", length=46,
    found_semicolon=0x7f8fc2b26f00) at sql_parse.cc:5974
---Type <return> to continue, or q <return> to quit---
#24 0x000000000068f670 in dispatch_command (command=COM_QUERY, thd=0x2eeb858, packet=0x2efd6a9 " RESTORE FROM '/tmp/gentest13075.tmp' OVERWRITE",
    packet_length=47) at sql_parse.cc:1075
#25 0x0000000000690b51 in do_command (thd=0x2eeb858) at sql_parse.cc:759
#26 0x000000000067df62 in handle_one_connection (arg=0x2eeb858) at sql_connect.cc:1164
#27 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#28 0x000000315a4e627d in clone () from /lib64/libc.so.6

#3  0x00000000008ae6b2 in MDL_context::wait_for_locks (this=0x2f21300, mdl_requests=0x7f8fc2a61d10) at mdl.cc:1295
#4  0x00000000006e0a19 in Open_table_context::recover_from_failed_open_table_attempt (this=0x7f8fc2a61d10, thd=0x2f21228, table=0x2f323a8)
    at sql_base.cc:3726
#5  0x00000000006e2ef5 in open_tables (thd=0x2f21228, start=0x7f8fc2a61d90, counter=0x7f8fc2a61dd0, flags=0, prelocking_strategy=0x7f8fc2a61e10)
    at sql_base.cc:4232
#6  0x00000000006e3352 in open_and_lock_tables_derived (thd=0x2f21228, tables=0x2f323a8, derived=true, flags=0, prelocking_strategy=0x7f8fc2a61e10)
    at sql_base.cc:4789
#7  0x0000000000691b3a in open_and_lock_tables_derived (thd=0x2f21228, tables=0x2f323a8, derived=true, flags=0) at ../mysql_priv.h:1571
#8  0x0000000000691b75 in open_and_lock_tables (thd=0x2f21228, tables=0x2f323a8) at ../../sql/mysql_priv.h:1581
#9  0x0000000000684ac0 in execute_sqlcom_select (thd=0x2f21228, all_tables=0x2f323a8) at sql_parse.cc:4925
#10 0x00000000006865cf in mysql_execute_command (thd=0x2f21228) at sql_parse.cc:2152
#11 0x000000000068eb2c in mysql_parse (thd=0x2f21228,
    inBuf=0x2f31040 "SELECT\n\t\t\tAVG(`col_int_key`) + AVG(`col_int`) AS average1,\n\t\t\t(SUM(`col_int_key`) + SUM(`col_int`)) / COUNT(*) AS average2,\n\t\t\tCOUNT(*) AS count\n\t\t\tFROM `table100_innodb_int_autoinc`", length=182, found_semicolon=0x7f8fc2a63f00) at sql_parse.cc:5974
---Type <return> to continue, or q <return> to quit---
#12 0x000000000068f670 in dispatch_command (command=COM_QUERY, thd=0x2f21228,
    packet=0x30187f9 "\n\t\t\tSELECT\n\t\t\tAVG(`col_int_key`) + AVG(`col_int`) AS average1,\n\t\t\t(SUM(`col_int_key`) + SUM(`col_int`)) / COUNT(*) AS average2,\n\t\t\tCOUNT(*) AS count\n\t\t\tFROM `table100_innodb_int_autoinc`\n\t\t", packet_length=189) at sql_parse.cc:1075
#13 0x0000000000690b51 in do_command (thd=0x2f21228) at sql_parse.cc:759
#14 0x000000000067df62 in handle_one_connection (arg=0x2f21228) at sql_connect.cc:1164
#15 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#16 0x000000315a4e627d in clone () from /lib64/libc.so.6

How to repeat:
This is easily reproducible with the RQG, a test case will be uploaded shortly.
[17 Jan 2010 13:38] Philip Stoev
Core and binary:

http://mysql-systemqa.s3.amazonaws.com/var-bug50396.zip

Source:

revision-id: charles.bell@sun.com-20100114172634-ntlxz2py57m2lkat
date: 2010-01-14 12:26:34 -0500
build-date: 2010-01-17 14:52:14 +0200
revno: 3065
branch-nick: mysql-backup-backport
[17 Jan 2010 15:42] Philip Stoev
bug50396.yy grammar

Attachment: bug50396.yy (application/octet-stream, text), 3.72 KiB.

[17 Jan 2010 15:44] Philip Stoev
To reproduce with the RQG:

 perl runall.pl \
 --mysqld=--loose-innodb-lock-wait-timeout=5 \
 --mysqld=--table-lock-wait-timeout=5  \
 --mysqld=--loose-skip-safemalloc \
 --grammar=conf/bug50396.yy \
 --gendata=conf/transactions.zz \
 --reporters=Deadlock,ErrorLog,Backtrace,Shutdown \
 --duration=600 \
 --basedir=/build/bzr/mysql-6.0-backup/ \
 --vardir=/tmp/vardir \
 --mysqld=--log-output=file \
 --queries=100000 \
 --engine=innodb \
 --mysqld=--innodb \
 --mysqld=--loose-mysql-backup \
 --validator=DatabaseConsistency

The YY grammar contains non-concurrent BACKUP/RESTORE in a single thread and DML statements in the other 9 threads. In addition to that, the DatabaseConsistency constantly issues full-table-scan SELECT queries in order to validate that the tables remain consistent. It is those SELECTs that seem to cause the deadlock.
[21 Jan 2010 11:37] Philip Stoev
Grammar for bug 50396 without BACKUP/RESTORE

Attachment: bug50396-2.yy (application/octet-stream, text), 3.73 KiB.

[21 Jan 2010 11:41] Philip Stoev
It turns out that this bug does not require RESTORE . Instead, a new grammar has been uploaded that uses DROP TABLE.

The bug is very timing sensitive -- DROP TABLE causes the table to be gone very soon, so it may not be reproducible with e.g. --threads=2 or a different grammar.
[21 Jan 2010 15:25] Philip Stoev
DROP DATABASE is involved, not DROP TABLE.
[21 Jan 2010 15:27] Philip Stoev
Thread stacks without RESTORE, just DROP DATABASE

Attachment: bug50396.threads.txt (text/plain), 39.75 KiB.

[21 Jan 2010 15:42] Philip Stoev
It turns out that this is a legitimate hang caused by transactions remaining open forever. The situation can be resolved by using KILL.

As discussed with Jon Olav Hauglid, I am shelving this bug until the MDL starts implementing a timeout, which should cover situations like this. I will re-test after the timeout functionality has been pushed.
[21 Jan 2010 15:43] Jon Olav Hauglid
For reference, the timeout bug is Bug#45225.