Bug #45143 All connections hang on concurrent ALTER TABLE
Submitted: 27 May 2009 18:27 Modified: 12 Nov 2009 19:50
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S3 (Non-critical)
Version:5.1,5.4,6.0 OS:Any
Assigned to: Dmitry Lenev CPU Architecture:Any

[27 May 2009 18:27] Philip Stoev
Description:
When executing a concurrent workload containing ALTER statements, all statements hanged. CPU usage is 0% and no statements are allowed to proceed.

Though such concurrent ALTERs are unlikely in real life, fixing this situation is important for stress testing the new locking mechanisms in 5.4/6.0

How to repeat:
A test case will be uploaded shortly.

Suggested fix:
* At least one of a set of conflicting ALTERs should be allowed to proceed.

* ALTERs should be subject to the table-lock-wait-timeout value and should not hang forever
[27 May 2009 18:27] Philip Stoev
kill -HUP output:

Status information:

Current dir: /build/bzr/mysql-5.1/mysql-test/var/master-data/
Running threads: 101  Stack size: 262144
Current locks:
lock: 0x261f668:

lock: 0x7f4be40353f0:

lock: 0x7f4be40b2a70:

lock: 0x26fe218: write write_wait read_wait
write     : 0x2c34f30 (55:12);
write_wait: 0x2675690 (64:12); 0x2c1a3b0 (46:7); 0x7f4be4001860 (84:12); 0x7f4be405cfa0 (47:7); 0x7f4be4038d60 (7:12); 0x2c02fc0 (87:12); 0x7f4be402fa20 (105:11); 0x7f4be4022440 (98:11); 0x7f4be00a8070 (53:12); 0x7f4be40513a0 (9:11);
read_wait : 0x7f4be0017110 (62:2); 0x7f4be009d370 (94:3); 0x7f4be800aca0 (88:3);

lock: 0x2a929e0: write write_wait read_wait
write     : 0x265d2e0 (40:12);
write_wait: 0x7f4be80bd660 (78:12); 0x7f4be409a570 (31:12); 0x7f4be007b1e0 (68:12); 0x27d5a80 (33:12); 0x7f4be4083b00 (29:7); 0x7f4be409cf40 (24:7); 0x29b3f80 (35:12); 0x7f4be4030c60 (36:12); 0x2c1ede0 (79:7); 0x7f4be80bda00 (8:7); 0x25ff7f0 (39:12); 0x7f4be009ef30 (58:12); 0x7f4be80a2580 (49:11); 0x7f4be000dd00 (48:7); 0x7f4be8023220 (103:7); 0x7f4be007e2b0 (104:7); 0x7f4be8069be0 (75:12); 0x7f4be80b4e90 (85:12); 0x7f4be006ace0 (92:12); 0x7f4be00e6360 (97:12); 0x25f4fd0 (70:7);
read_wait : 0x2ad84b0 (81:4); 0x2b7a880 (55:2); 0x7f4be4060250 (57:2); 0x7f4be805ff50 (11:3);

lock: 0x7f4be0047c60:

lock: 0x2ac5758: write write_wait read_wait
write     : 0x7f4be80da1d0 (28:12);
write_wait: 0x7f4be806e590 (21:7); 0x7f4be006b7f0 (54:7); 0x7f4be40992b0 (83:11); 0x7f4be0068630 (12:12); 0x7f4be4017a00 (45:7); 0x7f4be40c33f0 (34:12); 0x7f4be8067330 (95:7); 0x7f4be0069250 (91:12); 0x7f4be40965e0 (93:7); 0x7f4be407f2b0 (44:7); 0x7f4be0060a90 (51:12); 0x7f4be0004850 (14:7); 0x7f4be8041140 (43:7); 0x2a32fc0 (17:12); 0x7f4be808aac0 (59:12);
read_wait : 0x7f4be80b3f20 (20:3); 0x7f4be80a7b60 (89:4); 0x7f4be00c32f0 (15:2); 0x28b09a0 (19:2); 0x7f4be0085010 (50:4); 0x7f4be40d3e30 (32:2);

lock: 0x2b8a9a0:

lock: 0x7f4be002ae70:

lock: 0x2636b20: write write_wait read
write     : 0x7f4be8027c80 (75:6); 0x7f4be408ad80 (57:6);
write_wait: 0x7f4be40823d0 (101:7); 0x2c35ea0 (61:6); 0x7f4be4093080 (77:7); 0x7f4be401e7f0 (26:6); 0x2862580 (30:6); 0x7f4be8017b90 (69:6); 0x292a4b0 (80:6); 0x7f4be00c02b0 (65:6); 0x2b87740 (41:6); 0x7f4be000e1c0 (56:6); 0x7f4be808c860 (6:6);
read      : 0x7f4be00c88f0 (94:3); 0x7f4be40a12a0 (32:3);

lock: 0x2acd1a0: write write_wait
write     : 0x7f4be00d7060 (28:6);
write_wait: 0x27aed50 (52:7); 0x7f4be8081820 (38:6); 0x2b8ece0 (40:6); 0x7f4be00c8500 (96:7); 0x7f4be401c760 (67:6); 0x7f4be4057030 (76:6); 0x7f4be403b970 (16:7); 0x7f4be0017cb0 (18:6); 0x7f4be0090e80 (102:7); 0x7f4be005f310 (28:6); 0x27ced00 (73:7); 0x7f4be8009150 (13:6); 0x7f4be8083ca0 (27:6); 0x7f4be8072340 (63:6); 0x7f4be80367f0 (66:7); 0x7f4be8017f80 (72:6); 0x7f4be4098be0 (42:7); 0x7f4be807c950 (71:6);

lock: 0x25d87e0: write write_wait read
write     : 0x27f5c90 (17:6);
write_wait: 0x7f4be0011760 (74:7); 0x7f4be803bce0 (86:7); 0x2928ef0 (23:7); 0x7f4be80b1d90 (99:6); 0x7f4be8016790 (82:7); 0x265c6d0 (90:6); 0x7f4be0064d90 (37:6); 0x7f4be808d420 (22:7); 0x7f4be80de0c0 (25:6); 0x27f0720 (10:7); 0x7f4be00ac760 (60:6); 0x7f4be8023de0 (100:6);
read      : 0x7f4be80841e0 (20:3); 0x7f4be80d0fb0 (89:4);

lock: 0x7f4be00257d0:

Key caches:
default
Buffer_size:       1048576
Block_size:           1024
Division_limit:        100
Age_limit:             300
blocks used:           128
not flushed:             0
w_requests:          72247
writes:              10920
r_requests:         106616
reads:                 831

handler status:
read_key:         4584
read_next:        1263
read_rnd            47
read_first:       1012
write:          114133
delete             493
update:            927

Table status:
Opened tables:      76067
Open tables:          174
Open files:            54
Open streams:           0

Alarm status:
Active alarms:   1
Max used alarms: 96
Next alarm time: 22937

Thread database.table_name          Locked/Waiting        Lock_type

17      test.B                      Locked - write        High priority write lock
20      test.BB                     Waiting - read        Shared read lock
20      test.B                      Locked - read         Shared read lock
21      test.BB                     Waiting - write       Write lock, but allow reading
28      test.BB                     Locked - write        High priority write lock
28      test.CC                     Locked - write        High priority write lock
40      test.AA                     Locked - write        High priority write lock
52      test.CC                     Waiting - write       Write lock, but allow reading
55      test.DD                     Locked - write        High priority write lock
62      test.DD                     Waiting - read        Low priority read lock
64      test.DD                     Waiting - write       High priority write lock
74      test.B                      Waiting - write       Write lock, but allow reading
75      test.A                      Locked - write        Low priority write lock
78      test.AA                     Waiting - write       High priority write lock
81      test.AA                     Waiting - read        High priority read lock
94      test.A                      Locked - read         Shared read lock
101     test.A                      Waiting - write       Write lock, but allow reading
[27 May 2009 18:29] Philip Stoev
bug 45143 processlist

Attachment: bug45143.processlist.txt (text/plain), 25.87 KiB.

[27 May 2009 18:30] Philip Stoev
Thread stacks for bug 45143

Attachment: bug45143.stacks.txt (text/plain), 96.97 KiB.

[1 Jul 2009 16:25] Matthias Leich
Grammar of creation of objects

Attachment: bug45143_data.zz (application/octet-stream, text), 165 bytes.

[1 Jul 2009 16:26] Matthias Leich
Grammar for stress testing

Attachment: bug45143_sql.yy (application/octet-stream, text), 969 bytes.

[1 Jul 2009 17:06] Matthias Leich
I uploaded two files with simplified grammar for
replaying the current bug.
There are some maybe important comments in
bug45143_sql.yy.
How to run the test:
1. copy bug45143_sql.yy and bug45143_data.zz
   to the subdirectory "conf" of the RQG.
2. Run something like
   perl runall.pl --mem \
   --basedir=<path to MySQL toplevel directory> \
   --threads=10 --queries=10000 --debug \
   --mysqld=--table-lock-wait-timeout=5 \
   --mysqld=--innodb-lock-wait-timeout=5 \
   --gendata=conf/bug45143_data.zz \
   --grammar=conf/bug45143_sql.yy \
   --seed=0
I get an "endless" deadlock (all 10 stress threads
are state "Table lock") after ~ 10 seconds.
Example:
SHOW PROCESSLIST (a bit edited):
Id db Command Time State Info
1  test Sleep 68 NULL       NULL
6  test Query 66 Table lock UPDATE table1_int_autoinc SET `int` = 1 WHERE `pk` > 2
7  test Query 66 Table lock ALTER  TABLE table1_int_autoinc ENGINE = InnoDB
8  test Query 66 Table lock ALTER  TABLE table1_int_autoinc ENGINE = InnoDB
9  test Query 66 Table lock ALTER  TABLE table1_int_autoinc ENGINE = MEMORY
10 test Query 66 Table lock UPDATE table1_int_autoinc SET `int` = 8 WHERE `pk` > 5
11 test Query 66 Table lock ALTER ONLINE TABLE table1_int_autoinc ENGINE = InnoDB
12 test Query 66 Table lock SELECT A . `int_key` FROM table1_int_autoinc AS A LEFT JOIN table1_int_autoinc AS B USING (`pk`) FOR
13 test Query 66 Table lock SELECT A . `int` FROM table1_int_autoinc AS A LEFT JOIN table1_int_autoinc AS B USING (`pk`) FOR UPD
14 test Query 66 Table lock ALTER ONLINE TABLE table1_int_autoinc ENGINE = MEMORY
15 test Query 66 Table lock ALTER  TABLE table1_int_autoinc ENGINE = MEMORY
16 NULL Query 0  NULL       SHOW PROCESSLIST

What's in my eyes especially suspicious?
ALTER ONLINE TABLE gets all time rejected with
  "failed: 1235 This version of MySQL doesn't yet support
  'ALTER ONLINE TABLE table%d_int_autoinc ENGINE = MEMORY'
independent of the storage engine
- just used for the table
- assigned in ALTER ....
So why
- does this not yet supported statement need any lock
- disappear the deadlocks if I remove the "ONLINE"
?
[4 Sep 2009 9:53] Dmitry Lenev
After investigation it has turned out that this bug is not specific to ALTER TABLE and can be reproduced using e.g. LOCK TABLES. To do this one should to apply the following patch, which introduces a new DEBUG_SYNC point, to the server source:

=== modified file 'mysys/thr_lock.c'
--- mysys/thr_lock.c    2009-03-11 17:17:00 +0000
+++ mysys/thr_lock.c    2009-09-04 08:04:49 +0000
@@ -1023,6 +1023,7 @@ thr_multi_lock(THR_LOCK_DATA **data, uin
       thr_multi_unlock(data,(uint) (pos-data));
       DBUG_RETURN(result);
     }
+    DEBUG_SYNC_C("thr_multi_lock_after_thr_lock");
 #ifdef MAIN
     printf("Thread: %s  Got lock: 0x%lx  type: %d\n",my_thread_name(),
           (long) pos[0]->lock, pos[0]->type); fflush(stdout);

And then run the following test case using mysqltest tool:

--source include/have_innodb.inc
--source include/have_debug_sync.inc

connect (con1,localhost,root,,test,,);
connect (con3,localhost,root,,test,,);
connect (con2,localhost,root,,test,,);
connection default;

SET DEBUG_SYNC= 'RESET';
SET GLOBAL GENERAL_LOG=OFF;

create table t1 (i int) engine=InnoDB;
insert into t1 values (1);

select get_lock("lock_bug45143_wait", 0);

connection con1;
--send insert into t1 values (get_lock("lock_bug45143_wait", 100));

connection con2;
--sleep 2
set debug_sync='thr_multi_lock_after_thr_lock SIGNAL parked WAIT_FOR go';
--send select count(*) from t1 as a, t1 as b for update;

connection con3;
set debug_sync= 'now WAIT_FOR parked';
--send lock table t1 write;

connection default;
--sleep 2
set debug_sync= 'now SIGNAL go';
--sleep 2
select release_lock("lock_bug45143_wait");

connection con1;
--reap

# The below statement will show that both SELECT FOR UPDATE and LOCK TABLES
# are waiting for table lock.
show processlist;

connection con2;
--reap

connection con3;
--reap
[7 Sep 2009 14:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82598

2810 Dmitry Lenev	2009-09-07
      Fix for  bug #45143 "All connections hang on concurrent ALTER TABLE".
       
      Concurrent execution of statements which require non-table-level
      write locks on several instances of the same table (such as 
      SELECT ... FOR UPDATE which uses same InnoDB table twice or a DML 
      statement which invokes trigger which tries to update same InnoDB 
      table directly and through stored function) and statements which 
      required table-level locks on this table (e.g. LOCK TABLE ... WRITE,
      ALTER TABLE, ...) might have resulted in a deadlock.
       
      The problem occured when a thread tried to acquire write lock 
      (TL_WRITE_ALLOW_WRITE) on the table but had to wait since there was 
      a pending write lock (TL_WRITE, TL_WRITE_ALLOW_READ) on this table
      and we failed to detect that this thread already had another instance
      of write lock on it because there was also another thread holding 
      write lock on the table (also TL_WRITE_ALLOW_WRITE).
      When the latter thread released its lock neither the first thread
      nor the thread trying to acquire TL_WRITE/TL_WRITE_ALLOW_READ were
      woken up (as table was still write locked by the first thread) 
      so we ended up with a deadlock.
       
      This patch solves this problem by ensuring that thread which 
      already has write lock on the table won't wait when it tries
      to acquire second write lock on the same table.
      
      Questions for reviewer are marked by QQ.
      
      QQ should not we fix it in 5.1 as well?
     @ mysql-test/r/lock_sync.result
        Added test case for bug #45143 "All connections hang on concurrent
        ALTER TABLE".
     @ mysql-test/t/lock_sync.test
        Added test case for bug #45143 "All connections hang on concurrent
        ALTER TABLE".
     @ mysys/thr_lock.c
        Ensured that thread can acquire write lock on the table without 
        waiting if it already has write lock on it even if there are other 
        threads holding write locks on this table (this is normal situation 
        for, e.g., TL_WRITE_ALLOW_WRITE type of lock).
        
        Adjusted comments to better explain why it is OK to do so and added 
        asserts to prevent introduction of scenarios in which this can cause 
        problems.
[15 Sep 2009 8:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83238

2816 Dmitry Lenev	2009-09-15
      Fix for  bug #45143 "All connections hang on concurrent ALTER TABLE".
       
      Concurrent execution of statements which require non-table-level
      write locks on several instances of the same table (such as 
      SELECT ... FOR UPDATE which uses same InnoDB table twice or a DML 
      statement which invokes trigger which tries to update same InnoDB 
      table directly and through stored function) and statements which 
      required table-level locks on this table (e.g. LOCK TABLE ... WRITE,
      ALTER TABLE, ...) might have resulted in a deadlock.
       
      The problem occured when a thread tried to acquire write lock 
      (TL_WRITE_ALLOW_WRITE) on the table but had to wait since there was 
      a pending write lock (TL_WRITE, TL_WRITE_ALLOW_READ) on this table
      and we failed to detect that this thread already had another instance
      of write lock on it (so in fact we were trying to acquire recursive
      lock) because there was also another thread holding write lock on the
      table (also TL_WRITE_ALLOW_WRITE). When the latter thread released
      its lock neither the first thread nor the thread trying to acquire
      TL_WRITE/TL_WRITE_ALLOW_READ were woken up (as table was still write
      locked by the first thread) so we ended up with a deadlock.
       
      This patch solves this problem by ensuring that thread which 
      already has write lock on the table won't wait when it tries
      to acquire second write lock on the same table.
     @ mysql-test/r/lock_sync.result
        Added test case for bug #45143 "All connections hang on concurrent
        ALTER TABLE".
     @ mysql-test/t/lock_sync.test
        Added test case for bug #45143 "All connections hang on concurrent
        ALTER TABLE".
     @ mysys/thr_lock.c
        Ensured that thread can acquire write lock on the table without 
        waiting if it already has write lock on it even if there are other 
        threads holding write locks on this table (this is normal situation 
        for, e.g., TL_WRITE_ALLOW_WRITE type of lock).
        
        Adjusted comments to better explain why it is OK to do so and added 
        asserts to prevent introduction of scenarios in which this can cause 
        problems.
[15 Sep 2009 8:33] Dmitry Lenev
During developing fix for this bug it became clear that not only scenarios including ALTER are affected.

More precisely set of affected cases can be described as "concurrent execution of statements which require non-table-level write locks on several instances of the same table (such as SELECT ... FOR UPDATE which uses same InnoDB table twice or a DML statement which invokes trigger which tries to update same InnoDB table directly and through stored function) and statements which required table-level locks on this table (e.g. LOCK TABLE ... WRITE,ALTER TABLE, ...)".

Given that this bug also affects 5.1 I am asking for its re-triage.
[26 Oct 2009 19:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/88218

2921 Dmitry Lenev	2009-10-26
      Fix for  bug #45143 "All connections hang on concurrent ALTER TABLE".
      
      Concurrent execution of statements which require non-table-level
      write locks on several instances of the same table (such as
      SELECT ... FOR UPDATE which uses same InnoDB table twice or a DML
      statement which invokes trigger which tries to update same InnoDB
      table directly and through stored function) and statements which
      required table-level locks on this table (e.g. LOCK TABLE ... WRITE,
      ALTER TABLE, ...) might have resulted in a deadlock.
      
      The problem occured when a thread tried to acquire write lock
      (TL_WRITE_ALLOW_WRITE) on the table but had to wait since there was
      a pending write lock (TL_WRITE, TL_WRITE_ALLOW_READ) on this table
      and we failed to detect that this thread already had another instance
      of write lock on it (so in fact we were trying to acquire recursive
      lock) because there was also another thread holding write lock on the
      table (also TL_WRITE_ALLOW_WRITE). When the latter thread released
      its lock neither the first thread nor the thread trying to acquire
      TL_WRITE/TL_WRITE_ALLOW_READ were woken up (as table was still write
      locked by the first thread) so we ended up with a deadlock.
      
      This patch solves this problem by ensuring that thread which
      already has write lock on the table won't wait when it tries
      to acquire second write lock on the same table.
     @ mysql-test/r/lock_sync.result
        Added test case for bug #45143 "All connections hang on concurrent
        ALTER TABLE".
     @ mysql-test/t/lock_sync.test
        Added test case for bug #45143 "All connections hang on concurrent
        ALTER TABLE".
     @ mysys/thr_lock.c
        Ensured that thread can acquire write lock on the table without
        waiting if it already has write lock on it even if there are other
        threads holding write locks on this table (this is normal situation
        for, e.g., TL_WRITE_ALLOW_WRITE type of lock).
        
        Adjusted comments to better explain why it is OK to do so and added
        asserts to prevent introduction of scenarios in which this can cause
        problems.
[26 Oct 2009 19:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/88219

2921 Dmitry Lenev	2009-10-26
      Fix for  bug #45143 "All connections hang on concurrent ALTER TABLE".
      
      Concurrent execution of statements which require non-table-level
      write locks on several instances of the same table (such as
      SELECT ... FOR UPDATE which uses same InnoDB table twice or a DML
      statement which invokes trigger which tries to update same InnoDB
      table directly and through stored function) and statements which
      required table-level locks on this table (e.g. LOCK TABLE ... WRITE,
      ALTER TABLE, ...) might have resulted in a deadlock.
      
      The problem occured when a thread tried to acquire write lock
      (TL_WRITE_ALLOW_WRITE) on the table but had to wait since there was
      a pending write lock (TL_WRITE, TL_WRITE_ALLOW_READ) on this table
      and we failed to detect that this thread already had another instance
      of write lock on it (so in fact we were trying to acquire recursive
      lock) because there was also another thread holding write lock on the
      table (also TL_WRITE_ALLOW_WRITE). When the latter thread released
      its lock neither the first thread nor the thread trying to acquire
      TL_WRITE/TL_WRITE_ALLOW_READ were woken up (as table was still write
      locked by the first thread) so we ended up with a deadlock.
      
      This patch solves this problem by ensuring that thread which
      already has write lock on the table won't wait when it tries
      to acquire second write lock on the same table.
     @ mysql-test/r/lock_sync.result
        Added test case for bug #45143 "All connections hang on concurrent
        ALTER TABLE".
     @ mysql-test/t/lock_sync.test
        Added test case for bug #45143 "All connections hang on concurrent
        ALTER TABLE".
     @ mysys/thr_lock.c
        Ensured that thread can acquire write lock on the table without
        waiting if it already has write lock on it even if there are other
        threads holding write locks on this table (this is normal situation
        for, e.g., TL_WRITE_ALLOW_WRITE type of lock).
        
        Adjusted comments to better explain why it is OK to do so and added
        asserts to prevent introduction of scenarios in which this can cause
        problems.
[26 Oct 2009 19:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/88220

3674 Dmitry Lenev	2009-10-26 [merge]
      Manual merge of fix for bug #45143 "All connections hang on concurrent
      ALTER TABLE" into mysql-6.0-codebase based tree.
[27 Oct 2009 6:15] Dmitry Lenev
Fix for this bug was queued into mysql-next-mr-bugfixing and mysql-6.0-codebase-bugfixing trees.
[31 Oct 2009 8:19] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091031081410-qkxmjsdzjmj840aq) (version source revid:dlenev@mysql.com-20091026195437-86wtt274rxi662ys) (merge vers: 6.0.14-alpha) (pib:13)
[2 Nov 2009 21:04] Paul DuBois
Noted in 6.0.14 changelog.

Concurrent execution of statements requiring a table-level lock and
statements requiring a non-table-level write lock for a table could
deadlock.

Setting report to NDI pending push to 5.5.x.
[12 Nov 2009 8:20] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:mikael@mysql.com-20091102100915-a2nbfxaqprpgptfw) (merge vers: 5.5.0-beta) (pib:13)
[12 Nov 2009 19:50] Paul DuBois
Noted in 5.5.0 changelog.