Bug #37346 innodb does not detect deadlock between update and alter table
Submitted: 11 Jun 2008 16:35 Modified: 22 Oct 12:21
Reporter: Shane Bester
Status: In progress
Category:Server: Locking Severity:S3 (Non-critical)
Version:5.0,5.1 OS:Any
Assigned to: Dmitri Lenev Target Version:5.4+
Triage: Triaged: D2 (Serious) / R2 (Low) / E4 (High)

[11 Jun 2008 16:35] Shane Bester
Description:
under some circumstances innodb appears to not detect deadlocks, and two transactions
hang without progressing until innodb_lock_wait_timeout is reached.

problem is that both transactions never progress at all after becoming locked. so no
matter how long you put innodb lock wait timeout, it's guaranteed that at least one
transaction will be aborted when the lock wait expires.

summary info below, full info and testcase will be attached in a file.

 113 | copy to tmp table | alter table t1 engine=innodb                    
 113 | Locked            | update t1 set b='technicals' where a='therapist'

alter trx: 

TABLE LOCK table `test`.`t1` trx id 0 381459 lock mode IX   
is waiting for `GEN_CLUST_INDEX` lock_mode X waiting
and has 9 records with lock_mode X already.

update trx:

TABLE LOCK table `test`.`t1` trx id 0 381456 lock mode IX
has `GEN_CLUST_INDEX` lock_mode X locks rec but not gap
and index `a` lock_mode X locks gap before rec
also undo log entries 1

note: when the update transaction contains only a single update,
this hangup never happens.  it only happens when update trx contains
two or more updates.

How to repeat:
start mysqld with a innodb_lock_wait_timeout=999999
create an innodb table with 10 records

thread 1: trx with two updates and a commit
thread 2: alter table engine=innodb

the attached C testcase causes a hangup within minutes.

Suggested fix:
better to immediately detect a deadlock, or allow one of the transactions to continue
until it's committed, so that the other can continue.
[11 Jun 2008 16:37] Shane Bester
output of lock monitor, processlist, original testcase output showing 0 tps

Attachment: bug37346_innodb_lock_monitor.txt (text/plain), 12.71 KiB.

[11 Jun 2008 16:57] Shane Bester
testcase. run against mysqld with long innodb_lock_wait_timeout until queries hang

Attachment: bug37346.c (text/plain), 6.70 KiB.

[11 Jun 2008 17:00] Shane Bester
thread stacks of 2x update and 1x alter table

Attachment: bug37346_thread_stacks_during_hang.txt (text/plain), 6.83 KiB.

[11 Jun 2008 17:37] Heikki Tuuri
Shane,

this problem is a general one in the MySQL server. MySQL is not aware of lock waits
inside storage engines, and storage engines are not aware of MySQL table lock waits.
Currently, a lock wait timeout is the only way to resolve these deadlocks that involve a
MySQL table lock.

Setting this as a general MySQL Server bug.

In the general form, this is hard to fix, because that requires MySQL to keep a global
waits-for graph, also for storage engines.

Regards,

Heikki
[11 Jun 2008 17:49] Sinisa Milivojevic
Heikki,

I suggest that this bug is fixed in 6.0. In 6.0 we could use transactional table locks
and fix issues like these efficiently.

What do you think ???
[15 Jul 2008 4:53] Sean Pringle
The initial test case can be reduced by manually interleaving the 2x updates in txn1 with
the 1x alter in txn2. Open two MySQL client sessions...

In session #1:

create table t1 (c1 int primary key, c2 int, c3 int) engine=innodb;
insert into t1 values (1,1,0),(2,2,0),(3,3,0),(4,4,0),(5,5,0);
set autocommit=0;
update t1 set c3=c3+1 where c2=3;

In session #2:

alter table t1 engine=innodb;
(waits)

In session #1:

update t1 set c3=c3+1 where c2=4;
(waits)

Observe both sessions deadlocked for innodb_lock_wait_timeout seconds, then in session
#2:

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

In session #1:

Query OK..etc
commit;
[2 Oct 2008 23:19] Konstantin Osipov
Duplicate of Bug#989
[14 Jul 19:13] Miguel Solorzano
Sean`s test, repeatable in Windows Vista 64-bit:

mysql> alter table t1 engine=innodb;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
mysql>
[17 Jul 11:31] Konstantin Osipov
This should have been fixed by Bug#989 but wasn't due to an omission.
Please triage for sr54, it's part of WL#4284 specification that this sort of deadlock
goes away.
[17 Jul 12:01] Konstantin Osipov
Hm... not sure I have a plan for a quick fix in 5.4...
[17 Jul 12:05] Konstantin Osipov
SHOW PROCESSLIST output:

1 | localhost:60065 | test | Table lock | update t1 set c3=c3+1 where c2=4
2 | localhost:60068 | test | copy to tmp table | alter table t1 engine=innodb

ALTER is waiting to upgrade the metadata lock, update is waiting since
the table is locked by ALTER with TL_WRITE_ALLOW_READ data lock (which is incompatible
with TL_WRITE_ALLOW_WRITE requested by update).
[17 Jul 12:09] Konstantin Osipov
Update thread:
(gdb) bt
#0  0xb7f3e430 in __kernel_vsyscall ()
#1  0xb7f190e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/i686/cmov/libpthread.so.0
#2  0x08740704 in safe_cond_wait (cond=0xb116bf4, mp=0xb1ff054, 
    file=0x899fc6b "thr_lock.c", line=470) at thr_mutex.c:423
#3  0x0874c965 in wait_for_lock (wait=0xb1ff0c4, data=0xb2463f4, 
    in_wait_list=0 '\0') at thr_lock.c:470
#4  0x0874d49b in thr_lock (data=0xb2463f4, owner=0xb157038, 
    lock_type=TL_WRITE_ALLOW_WRITE) at thr_lock.c:762
#5  0x0874dec5 in thr_multi_lock (data=0xb25c3bc, count=1, owner=0xb157038)
    at thr_lock.c:1020
#6  0x082e8ba1 in mysql_lock_tables (thd=0xb156808, tables=0xb25c3a0, count=1, 
    flags=0, need_reopen=0xb47d4ac0) at lock.cc:321
#7  0x08359282 in lock_tables (thd=0xb156808, tables=0xb2452a0, count=1, 
    flags=0, need_reopen=0xb47d4ac0) at sql_base.cc:4541
#8  0x083c5996 in mysql_update (thd=0xb156808, table_list=0xb2452a0, 
    fields=@0xb157aec, values=@0xb157d58, conds=0xb245ae0, order_num=0, 
    order=0x0, limit=18446744073709551615, handle_duplicates=DUP_ERROR, 
    ignore=false) at sql_update.cc:227
#9  0x083070a9 in mysql_execute_command (thd=0xb156808) at sql_parse.cc:3099
#10 0x0830da03 in mysql_parse (thd=0xb156808, 
    inBuf=0xb2451d8 "update t1 set c3=c3+1 where c2=4", length=32, 
    found_semicolon=0xb47d5bd4) at sql_parse.cc:5942
#11 0x0830e55d in dispatch_command (command=COM_QUERY, thd=0xb156808, 
    packet=0xb1e4b99 "update t1 set c3=c3+1 where c2=4", packet_length=32)
    at sql_parse.cc:1061
#12 0x0830fa73 in do_command (thd=0xb156808) at sql_parse.cc:743
#13 0x082fc018 in handle_one_connection (arg=0xb156808) at sql_connect.cc:1158
#14 0xb7f154ff in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#15 0xb7d1b49e in clone () from /lib/tls/i686/cmov/libc.so.6
[17 Jul 12:10] Konstantin Osipov
Alter thread:
#0  0xb7f3e430 in __kernel_vsyscall ()
#1  0xb7f190e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/i686/cmov/libpthread.so.0
#2  0x08740704 in safe_cond_wait (cond=0xacff8bc, mp=0xacff850, 
    file=0x896f9ee "os/os0sync.c", line=422) at thr_mutex.c:423
#3  0x085e4c44 in os_event_wait_low (event=0xacff850, reset_sig_count=0)
    at os/os0sync.c:422
#4  0x08609a96 in srv_suspend_mysql_thread (thr=0xb69a88e8)
    at srv/srv0srv.c:1489
#5  0x085f632e in row_mysql_handle_errors (new_err=0xb47a2644, trx=0xb69a9c68, 
    thr=0xb69a88e8, savept=0x0) at row/row0mysql.c:496
#6  0x08603cdd in row_search_for_mysql (buf=0xb1fee30 "ÿ", mode=1, 
    prebuilt=0xb69a6c68, match_mode=0, direction=0) at row/row0sel.c:4419
#7  0x08598ed4 in ha_innobase::index_read (this=0xb1fec68, buf=0xb1fee30 "ÿ", 
    key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY)
    at handler/ha_innodb.cc:4493
#8  0x0859127c in ha_innobase::index_first (this=0xb1fec68, buf=0xb1fee30 "ÿ")
    at handler/ha_innodb.cc:4757
#9  0x08598bc7 in ha_innobase::rnd_next (this=0xb1fec68, buf=0xb1fee30 "ÿ")
    at handler/ha_innodb.cc:4854
#10 0x0843e996 in rr_sequential (info=0xb47a2c68) at records.cc:390
#11 0x08464f50 in copy_data_between_tables (from=0xb1fe5b0, to=0xb23e808, 
    create=@0xb47a416c, ignore=false, order_num=0, order=0x0, 
    copied=0xb47a32d8, deleted=0xb47a32d0, keys_onoff=LEAVE_AS_IS, 
    error_if_not_empty=false) at sql_table.cc:7892
#12 0x08472d11 in mysql_alter_table (thd=0xb224700, new_db=0xb1b2d98 "test", 
    new_name=0xb117278 "t1", create_info=0xb47a3e44, table_list=0xb1f9be0, 
    alter_info=0xb47a413c, order_num=0, order=0x0, ignore=false)
    at sql_table.cc:7515
#13 0x083063f2 in mysql_execute_command (thd=0xb224700) at sql_parse.cc:2871
#14 0x0830da03 in mysql_parse (thd=0xb224700, 
    inBuf=0xb2211b0 "alter table t1 engine=innodb", length=28, 
    found_semicolon=0xb47a4bd4) at sql_parse.cc:5942
#15 0x0830e55d in dispatch_command (command=COM_QUERY, thd=0xb224700, 
    packet=0xb22a139 "alter table t1 engine=innodb", packet_length=28)
    at sql_parse.cc:1061
#16 0x0830fa73 in do_command (thd=0xb224700) at sql_parse.cc:743
#17 0x082fc018 in handle_one_connection (arg=0xb224700) at sql_connect.cc:1158
#18 0xb7f154ff in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#19 0xb7d1b49e in clone () from /lib/tls/i686/cmov/libc.so.6

BTW, it times out after a while.
[17 Jul 17:06] Konstantin Osipov
Sorry for initial confusion.
This report has 3 different manifestations, and it got me confused.
Let me try to explain with 3 examples:

1) Unnecessary deadlock error between SELECT and ALTER, all engines.

connection1:

mysql> create table t1 (c1 int primary key, c2 int, c3 int) engine=innodb;
Query OK, 0 rows affected (0.09 sec)

mysql> insert into t1 values (1,1,0),(2,2,0),(3,3,0),(4,4,0),(5,5,0);
Query OK, 5 rows affected (0.04 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1 where c2=3;
+----+------+------+
| c1 | c2   | c3   |
+----+------+------+
|  3 |    3 |    0 | 
+----+------+------+
1 row in set (0.00 sec)

connection 2:

mysql> alter table t1 engine=innodb;

(waits)

connection 1:

mysql> select * from t1 where c2=4;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

The problem here is that connection 1' attempts to proceed with t1 fails.
This contradicts WL#4284 specification.

What should happen: connection 1 should be able to proceed. connection 2
should wait till connection 1 commits.

Note, that this behaviour is engine-agnostic, identical error occurs with MyISAM engine.
[17 Jul 17:07] Konstantin Osipov
The effort to fix this problem (case 1) is medium. Risk is low.
[17 Jul 19:55] Konstantin Osipov
The described case has now been reported separately as Bug#46272 MySQL 5.4.4, new MDL:
unnecessary deadlock
[20 Jul 11:52] Konstantin Osipov
Case 2), or, problem 2, only becomes visible when case 1) issue is eliminated/fixed.
It is a deadlock between UPDATE and ALTER connections, using MyISAM engine.
I will use a patched version of the server, with a crude fix for Bug#46272, to
demonstrate the problem.

The diff against the latest 5.4:

=== modified file 'sql/sql_base.cc'
--- sql/sql_base.cc     2009-07-06 08:38:21 +0000
+++ sql/sql_base.cc     2009-07-20 09:36:14 +0000
@@ -2078,12 +2078,6 @@ bool wait_while_table_is_used(THD *thd, 
                        table->db_stat, table->s->version));
 
   /* Ensure no one can reopen table before it's removed */
-  pthread_mutex_lock(&LOCK_open);
-  table->s->version= 0;
-  pthread_mutex_unlock(&LOCK_open);
-
-  old_lock_type= table->reginfo.lock_type;
-  mysql_lock_abort(thd, table, TRUE);  /* end threads waiting on lock */
 
   if (table->mdl_ticket->upgrade_shared_lock_to_exclusive())
   {
@@ -2092,6 +2086,15 @@ bool wait_while_table_is_used(THD *thd, 
   }
 
   pthread_mutex_lock(&LOCK_open);
+  table->s->version= 0;
+  pthread_mutex_unlock(&LOCK_open);
+
+  old_lock_type= table->reginfo.lock_type;
+#if 0
+  mysql_lock_abort(thd, table, TRUE);  /* end threads waiting on lock */
+#endif
+
+  pthread_mutex_lock(&LOCK_open);
   tdc_remove_table(thd, TDC_RT_REMOVE_NOT_OWN,
                    table->s->db.str, table->s->table_name.str);
   pthread_mutex_unlock(&LOCK_open);

The test case:

-- connection 1:

mysql> create table t1 (c1 int primary key, c2 int, c3 int) engine=myisam;
Query OK, 0 rows affected (0.12 sec)

mysql> insert into t1 values (1,1,0),(2,2,0),(3,3,0),(4,4,0),(5,5,0);
Query OK, 5 rows affected (0.01 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> update t1 set c3=c3+1 where c2=3;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> update t1 set c3=c3+1 where c2=3;
Query OK, 1 row affected (6.14 sec)
Rows matched: 1  Changed: 1  Warnings: 0

-- connection 2:

mysql> alter table t1 engine=myisam;

-- (waits)

-- connection 1:

mysql> update t1 set c3=c3+1 where c2=4;

-- (waits)

Voila, deadlock.
In this situation connection 2 (ALTER) is waiting on connection 1 to
give away its metadata lock on t1. Whereas connection 2 (UPDATE) is
waiting on table-level data lock on t1, since ALTER holds TL_WRITE_ALLOW_READ,
which conflicts with TL_WRITE needed for connection 1.
This is a deadlock that will never time out, since there are no
timeouts currently in MDL or thr_lock locks. 
Introducing MDL timeouts is subject of Bug#45225.

But a solution with DDL timeout is a partial one -- during waiting on DDL to timeout the
UPDATE transaction could very well commit, which is prevented
by ALTER holding a WRITE data lock on the table. So a better solution
is, for ALTER, to not block the UPDATE transaction and let it finish,
while preventing new UPDATE transactions from acquiring the lock on t1.

This is the issue reported here, in this report (Bug#37346). Note, 
that the only way I see to fix it is to extend metadata locking subsystem
to become more aware of data locks, such as MDL_READ_METADATA_UPDATE_DATA and
MDL_READ_METADATA_READ_DATA, so that MDL subsystem can wait on an upgradable
shared metadata lock without holding connections that are doing UPDATEs.
It's a major effort and a major extension of MDL functionality (two man-months of
development time).
[21 Jul 11:07] Andrii Nikitin
When this bug happens on PARTITIONED table it leaves files in corrupted state (bug #45961)
or crashes mysqld if innodb_file_per_table is OFF (bug #45808).