Bug #49960 Deadlock on a concurrent workload involving transactional lock table
Submitted: 28 Dec 2009 14:41 Modified: 27 May 2010 9:41
Reporter: Philip Stoev Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Locking Severity:S2 (Serious)
Version:next-mr-wl3561 OS:Any
Assigned to: Ingo Strüwing CPU Architecture:Any

[28 Dec 2009 14:41] Philip Stoev
Description:
When executing a workload involving transactional LOCK TABLE against the next-mr-wl3561 tree, mysqld deadlocked with the following thread being unique:

#0  0x000000315b00b309 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000009c3a2e in safe_cond_wait (cond=0xfb1e60, mp=0xfb1960, file=0xb1b088 "../include/mysql/psi/mysql_thread.h", line=784) at thr_mutex.c:240
#2  0x0000000000688003 in inline_mysql_cond_wait (that=0xfb1e60, mutex=0xfb1960) at ../include/mysql/psi/mysql_thread.h:784
#3  0x00000000006880b0 in wait_for_condition (thd=0x2c23508, mutex=0xfb1960, cond=0xfb1e60) at sql_base.cc:2203
#4  0x0000000000624fd2 in wait_for_locked_table_names (thd=0x2c23508, table_list=0x2bd1dc0) at lock.cc:1130
#5  0x000000000062544c in lock_table_names (thd=0x2c23508, table_list=0x2bd1dc0) at lock.cc:1171
#6  0x000000000081ca0b in mysql_create_or_drop_trigger (thd=0x2c23508, tables=0x2bd1dc0, create=true) at sql_trigger.cc:458
#7  0x0000000000648e9a in mysql_execute_command (thd=0x2c23508) at sql_parse.cc:4643
#8  0x000000000064a299 in mysql_parse (thd=0x2c23508,
    inBuf=0x2bd19f8 "CREATE TRIGGER testdb_N . tr1_2_N  BEFORE DELETE ON t1_base1_N FOR EACH ROW BEGIN REPLACE  INTO testdb_N . t1_merge2_N  ( `col_int` , `pk` , `col_int_key`  ) SELECT   `col_int_key` , `col_int` , `pk` "..., length=329, found_semicolon=0x7ff928673f10) at sql_parse.cc:6045
#9  0x000000000064aefb in dispatch_command (command=COM_QUERY, thd=0x2c23508,
    packet=0x2c01c29 "CREATE TRIGGER testdb_N . tr1_2_N  BEFORE DELETE ON t1_base1_N FOR EACH ROW BEGIN REPLACE  INTO testdb_N . t1_merge2_N  ( `col_int` , `pk` , `col_int_key`  ) SELECT   `col_int_key` , `col_int` , `pk` "..., packet_length=329) at sql_parse.cc:1128
#10 0x000000000064c2fd in do_command (thd=0x2c23508) at sql_parse.cc:798
#11 0x00000000006391c0 in handle_one_connection (arg=0x2c23508) at sql_connect.cc:1163
#12 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#13 0x000000315a4e627d in clone () from /lib64/libc.so.6

Various other DDL and DML threads are stick in open_tables() and close_cached_tables()

How to repeat:
If this is repeatable, a test case will be provided. In the meantime, the core and the binary will be uploaded.
[28 Dec 2009 14:42] Philip Stoev
Thread stacks for 49960

Attachment: bug49960.stacks.txt (text/plain), 29.12 KiB.

[28 Dec 2009 14:54] Philip Stoev
Code and binary:

http://mysql-systemqa.s3.amazonaws.com/var-bug49960.zip

Source:

revision-id: ingo.struewing@sun.com-20091221154123-51irmivi5ey9mt57
date: 2009-12-21 16:41:23 +0100
build-date: 2009-12-28 16:54:47 +0200
revno: 2943
branch-nick: mysql-next-mr-wl3561
[28 Dec 2009 16:00] Philip Stoev
zz file for 49960

Attachment: bug49960.zz (application/octet-stream, text), 691 bytes.

[28 Dec 2009 16:00] Philip Stoev
yy file for 49960

Attachment: bug49960.yy (application/octet-stream, text), 69.67 KiB.

[28 Dec 2009 16:02] Philip Stoev
To reproduce with the RQG:

$ perl runall.pl \
  --grammar=conf/WL5004_sql.yy \
  --gendata=conf/WL5004_data.zz \
  --basedir=/build/bzr/mysql-next-mr-wl3561/ \
  --queries=100K \
  --mysqld=--secure-file-priv=/tmp \
  --mysqld=--innodb-lock-wait-timeout=1 \
  --mysqld=--log-output=file \
  --mysqld=--skip-safemalloc

This test will usually terminate due to other crashes and assertions, however once every few runs, the deadlock will happen. You will observe that all the output from the RQG will stop and that SHOW PROCESSLIST will show that all queries have hanged beyond all reasonable timeout.
[28 Dec 2009 16:03] Philip Stoev
In the RQG command line, please use the YY and ZZ files as attached to the bug report.
[27 May 2010 9:41] Ingo Strüwing
This had been reported against a former attempt to backport WL#3561.
It does no longer belong to any existing software any more.
In similar tests against the new backport this hasn't been seen again.