Bug #45143 | All connections hang on concurrent ALTER TABLE | ||
---|---|---|---|
Submitted: | 27 May 2009 18:27 | Modified: | 12 Nov 2009 19:50 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S3 (Non-critical) |
Version: | 5.1,5.4,6.0 | OS: | Any |
Assigned to: | Dmitry Lenev | CPU Architecture: | Any |
[27 May 2009 18:27]
Philip Stoev
[27 May 2009 18:27]
Philip Stoev
kill -HUP output: Status information: Current dir: /build/bzr/mysql-5.1/mysql-test/var/master-data/ Running threads: 101 Stack size: 262144 Current locks: lock: 0x261f668: lock: 0x7f4be40353f0: lock: 0x7f4be40b2a70: lock: 0x26fe218: write write_wait read_wait write : 0x2c34f30 (55:12); write_wait: 0x2675690 (64:12); 0x2c1a3b0 (46:7); 0x7f4be4001860 (84:12); 0x7f4be405cfa0 (47:7); 0x7f4be4038d60 (7:12); 0x2c02fc0 (87:12); 0x7f4be402fa20 (105:11); 0x7f4be4022440 (98:11); 0x7f4be00a8070 (53:12); 0x7f4be40513a0 (9:11); read_wait : 0x7f4be0017110 (62:2); 0x7f4be009d370 (94:3); 0x7f4be800aca0 (88:3); lock: 0x2a929e0: write write_wait read_wait write : 0x265d2e0 (40:12); write_wait: 0x7f4be80bd660 (78:12); 0x7f4be409a570 (31:12); 0x7f4be007b1e0 (68:12); 0x27d5a80 (33:12); 0x7f4be4083b00 (29:7); 0x7f4be409cf40 (24:7); 0x29b3f80 (35:12); 0x7f4be4030c60 (36:12); 0x2c1ede0 (79:7); 0x7f4be80bda00 (8:7); 0x25ff7f0 (39:12); 0x7f4be009ef30 (58:12); 0x7f4be80a2580 (49:11); 0x7f4be000dd00 (48:7); 0x7f4be8023220 (103:7); 0x7f4be007e2b0 (104:7); 0x7f4be8069be0 (75:12); 0x7f4be80b4e90 (85:12); 0x7f4be006ace0 (92:12); 0x7f4be00e6360 (97:12); 0x25f4fd0 (70:7); read_wait : 0x2ad84b0 (81:4); 0x2b7a880 (55:2); 0x7f4be4060250 (57:2); 0x7f4be805ff50 (11:3); lock: 0x7f4be0047c60: lock: 0x2ac5758: write write_wait read_wait write : 0x7f4be80da1d0 (28:12); write_wait: 0x7f4be806e590 (21:7); 0x7f4be006b7f0 (54:7); 0x7f4be40992b0 (83:11); 0x7f4be0068630 (12:12); 0x7f4be4017a00 (45:7); 0x7f4be40c33f0 (34:12); 0x7f4be8067330 (95:7); 0x7f4be0069250 (91:12); 0x7f4be40965e0 (93:7); 0x7f4be407f2b0 (44:7); 0x7f4be0060a90 (51:12); 0x7f4be0004850 (14:7); 0x7f4be8041140 (43:7); 0x2a32fc0 (17:12); 0x7f4be808aac0 (59:12); read_wait : 0x7f4be80b3f20 (20:3); 0x7f4be80a7b60 (89:4); 0x7f4be00c32f0 (15:2); 0x28b09a0 (19:2); 0x7f4be0085010 (50:4); 0x7f4be40d3e30 (32:2); lock: 0x2b8a9a0: lock: 0x7f4be002ae70: lock: 0x2636b20: write write_wait read write : 0x7f4be8027c80 (75:6); 0x7f4be408ad80 (57:6); write_wait: 0x7f4be40823d0 (101:7); 0x2c35ea0 (61:6); 0x7f4be4093080 (77:7); 0x7f4be401e7f0 (26:6); 0x2862580 (30:6); 0x7f4be8017b90 (69:6); 0x292a4b0 (80:6); 0x7f4be00c02b0 (65:6); 0x2b87740 (41:6); 0x7f4be000e1c0 (56:6); 0x7f4be808c860 (6:6); read : 0x7f4be00c88f0 (94:3); 0x7f4be40a12a0 (32:3); lock: 0x2acd1a0: write write_wait write : 0x7f4be00d7060 (28:6); write_wait: 0x27aed50 (52:7); 0x7f4be8081820 (38:6); 0x2b8ece0 (40:6); 0x7f4be00c8500 (96:7); 0x7f4be401c760 (67:6); 0x7f4be4057030 (76:6); 0x7f4be403b970 (16:7); 0x7f4be0017cb0 (18:6); 0x7f4be0090e80 (102:7); 0x7f4be005f310 (28:6); 0x27ced00 (73:7); 0x7f4be8009150 (13:6); 0x7f4be8083ca0 (27:6); 0x7f4be8072340 (63:6); 0x7f4be80367f0 (66:7); 0x7f4be8017f80 (72:6); 0x7f4be4098be0 (42:7); 0x7f4be807c950 (71:6); lock: 0x25d87e0: write write_wait read write : 0x27f5c90 (17:6); write_wait: 0x7f4be0011760 (74:7); 0x7f4be803bce0 (86:7); 0x2928ef0 (23:7); 0x7f4be80b1d90 (99:6); 0x7f4be8016790 (82:7); 0x265c6d0 (90:6); 0x7f4be0064d90 (37:6); 0x7f4be808d420 (22:7); 0x7f4be80de0c0 (25:6); 0x27f0720 (10:7); 0x7f4be00ac760 (60:6); 0x7f4be8023de0 (100:6); read : 0x7f4be80841e0 (20:3); 0x7f4be80d0fb0 (89:4); lock: 0x7f4be00257d0: Key caches: default Buffer_size: 1048576 Block_size: 1024 Division_limit: 100 Age_limit: 300 blocks used: 128 not flushed: 0 w_requests: 72247 writes: 10920 r_requests: 106616 reads: 831 handler status: read_key: 4584 read_next: 1263 read_rnd 47 read_first: 1012 write: 114133 delete 493 update: 927 Table status: Opened tables: 76067 Open tables: 174 Open files: 54 Open streams: 0 Alarm status: Active alarms: 1 Max used alarms: 96 Next alarm time: 22937 Thread database.table_name Locked/Waiting Lock_type 17 test.B Locked - write High priority write lock 20 test.BB Waiting - read Shared read lock 20 test.B Locked - read Shared read lock 21 test.BB Waiting - write Write lock, but allow reading 28 test.BB Locked - write High priority write lock 28 test.CC Locked - write High priority write lock 40 test.AA Locked - write High priority write lock 52 test.CC Waiting - write Write lock, but allow reading 55 test.DD Locked - write High priority write lock 62 test.DD Waiting - read Low priority read lock 64 test.DD Waiting - write High priority write lock 74 test.B Waiting - write Write lock, but allow reading 75 test.A Locked - write Low priority write lock 78 test.AA Waiting - write High priority write lock 81 test.AA Waiting - read High priority read lock 94 test.A Locked - read Shared read lock 101 test.A Waiting - write Write lock, but allow reading
[27 May 2009 18:29]
Philip Stoev
bug 45143 processlist
Attachment: bug45143.processlist.txt (text/plain), 25.87 KiB.
[27 May 2009 18:30]
Philip Stoev
Thread stacks for bug 45143
Attachment: bug45143.stacks.txt (text/plain), 96.97 KiB.
[1 Jul 2009 16:25]
Matthias Leich
Grammar of creation of objects
Attachment: bug45143_data.zz (application/octet-stream, text), 165 bytes.
[1 Jul 2009 16:26]
Matthias Leich
Grammar for stress testing
Attachment: bug45143_sql.yy (application/octet-stream, text), 969 bytes.
[1 Jul 2009 17:06]
Matthias Leich
I uploaded two files with simplified grammar for replaying the current bug. There are some maybe important comments in bug45143_sql.yy. How to run the test: 1. copy bug45143_sql.yy and bug45143_data.zz to the subdirectory "conf" of the RQG. 2. Run something like perl runall.pl --mem \ --basedir=<path to MySQL toplevel directory> \ --threads=10 --queries=10000 --debug \ --mysqld=--table-lock-wait-timeout=5 \ --mysqld=--innodb-lock-wait-timeout=5 \ --gendata=conf/bug45143_data.zz \ --grammar=conf/bug45143_sql.yy \ --seed=0 I get an "endless" deadlock (all 10 stress threads are state "Table lock") after ~ 10 seconds. Example: SHOW PROCESSLIST (a bit edited): Id db Command Time State Info 1 test Sleep 68 NULL NULL 6 test Query 66 Table lock UPDATE table1_int_autoinc SET `int` = 1 WHERE `pk` > 2 7 test Query 66 Table lock ALTER TABLE table1_int_autoinc ENGINE = InnoDB 8 test Query 66 Table lock ALTER TABLE table1_int_autoinc ENGINE = InnoDB 9 test Query 66 Table lock ALTER TABLE table1_int_autoinc ENGINE = MEMORY 10 test Query 66 Table lock UPDATE table1_int_autoinc SET `int` = 8 WHERE `pk` > 5 11 test Query 66 Table lock ALTER ONLINE TABLE table1_int_autoinc ENGINE = InnoDB 12 test Query 66 Table lock SELECT A . `int_key` FROM table1_int_autoinc AS A LEFT JOIN table1_int_autoinc AS B USING (`pk`) FOR 13 test Query 66 Table lock SELECT A . `int` FROM table1_int_autoinc AS A LEFT JOIN table1_int_autoinc AS B USING (`pk`) FOR UPD 14 test Query 66 Table lock ALTER ONLINE TABLE table1_int_autoinc ENGINE = MEMORY 15 test Query 66 Table lock ALTER TABLE table1_int_autoinc ENGINE = MEMORY 16 NULL Query 0 NULL SHOW PROCESSLIST What's in my eyes especially suspicious? ALTER ONLINE TABLE gets all time rejected with "failed: 1235 This version of MySQL doesn't yet support 'ALTER ONLINE TABLE table%d_int_autoinc ENGINE = MEMORY' independent of the storage engine - just used for the table - assigned in ALTER .... So why - does this not yet supported statement need any lock - disappear the deadlocks if I remove the "ONLINE" ?
[4 Sep 2009 9:53]
Dmitry Lenev
After investigation it has turned out that this bug is not specific to ALTER TABLE and can be reproduced using e.g. LOCK TABLES. To do this one should to apply the following patch, which introduces a new DEBUG_SYNC point, to the server source: === modified file 'mysys/thr_lock.c' --- mysys/thr_lock.c 2009-03-11 17:17:00 +0000 +++ mysys/thr_lock.c 2009-09-04 08:04:49 +0000 @@ -1023,6 +1023,7 @@ thr_multi_lock(THR_LOCK_DATA **data, uin thr_multi_unlock(data,(uint) (pos-data)); DBUG_RETURN(result); } + DEBUG_SYNC_C("thr_multi_lock_after_thr_lock"); #ifdef MAIN printf("Thread: %s Got lock: 0x%lx type: %d\n",my_thread_name(), (long) pos[0]->lock, pos[0]->type); fflush(stdout); And then run the following test case using mysqltest tool: --source include/have_innodb.inc --source include/have_debug_sync.inc connect (con1,localhost,root,,test,,); connect (con3,localhost,root,,test,,); connect (con2,localhost,root,,test,,); connection default; SET DEBUG_SYNC= 'RESET'; SET GLOBAL GENERAL_LOG=OFF; create table t1 (i int) engine=InnoDB; insert into t1 values (1); select get_lock("lock_bug45143_wait", 0); connection con1; --send insert into t1 values (get_lock("lock_bug45143_wait", 100)); connection con2; --sleep 2 set debug_sync='thr_multi_lock_after_thr_lock SIGNAL parked WAIT_FOR go'; --send select count(*) from t1 as a, t1 as b for update; connection con3; set debug_sync= 'now WAIT_FOR parked'; --send lock table t1 write; connection default; --sleep 2 set debug_sync= 'now SIGNAL go'; --sleep 2 select release_lock("lock_bug45143_wait"); connection con1; --reap # The below statement will show that both SELECT FOR UPDATE and LOCK TABLES # are waiting for table lock. show processlist; connection con2; --reap connection con3; --reap
[7 Sep 2009 14:32]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/82598 2810 Dmitry Lenev 2009-09-07 Fix for bug #45143 "All connections hang on concurrent ALTER TABLE". Concurrent execution of statements which require non-table-level write locks on several instances of the same table (such as SELECT ... FOR UPDATE which uses same InnoDB table twice or a DML statement which invokes trigger which tries to update same InnoDB table directly and through stored function) and statements which required table-level locks on this table (e.g. LOCK TABLE ... WRITE, ALTER TABLE, ...) might have resulted in a deadlock. The problem occured when a thread tried to acquire write lock (TL_WRITE_ALLOW_WRITE) on the table but had to wait since there was a pending write lock (TL_WRITE, TL_WRITE_ALLOW_READ) on this table and we failed to detect that this thread already had another instance of write lock on it because there was also another thread holding write lock on the table (also TL_WRITE_ALLOW_WRITE). When the latter thread released its lock neither the first thread nor the thread trying to acquire TL_WRITE/TL_WRITE_ALLOW_READ were woken up (as table was still write locked by the first thread) so we ended up with a deadlock. This patch solves this problem by ensuring that thread which already has write lock on the table won't wait when it tries to acquire second write lock on the same table. Questions for reviewer are marked by QQ. QQ should not we fix it in 5.1 as well? @ mysql-test/r/lock_sync.result Added test case for bug #45143 "All connections hang on concurrent ALTER TABLE". @ mysql-test/t/lock_sync.test Added test case for bug #45143 "All connections hang on concurrent ALTER TABLE". @ mysys/thr_lock.c Ensured that thread can acquire write lock on the table without waiting if it already has write lock on it even if there are other threads holding write locks on this table (this is normal situation for, e.g., TL_WRITE_ALLOW_WRITE type of lock). Adjusted comments to better explain why it is OK to do so and added asserts to prevent introduction of scenarios in which this can cause problems.
[15 Sep 2009 8:22]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/83238 2816 Dmitry Lenev 2009-09-15 Fix for bug #45143 "All connections hang on concurrent ALTER TABLE". Concurrent execution of statements which require non-table-level write locks on several instances of the same table (such as SELECT ... FOR UPDATE which uses same InnoDB table twice or a DML statement which invokes trigger which tries to update same InnoDB table directly and through stored function) and statements which required table-level locks on this table (e.g. LOCK TABLE ... WRITE, ALTER TABLE, ...) might have resulted in a deadlock. The problem occured when a thread tried to acquire write lock (TL_WRITE_ALLOW_WRITE) on the table but had to wait since there was a pending write lock (TL_WRITE, TL_WRITE_ALLOW_READ) on this table and we failed to detect that this thread already had another instance of write lock on it (so in fact we were trying to acquire recursive lock) because there was also another thread holding write lock on the table (also TL_WRITE_ALLOW_WRITE). When the latter thread released its lock neither the first thread nor the thread trying to acquire TL_WRITE/TL_WRITE_ALLOW_READ were woken up (as table was still write locked by the first thread) so we ended up with a deadlock. This patch solves this problem by ensuring that thread which already has write lock on the table won't wait when it tries to acquire second write lock on the same table. @ mysql-test/r/lock_sync.result Added test case for bug #45143 "All connections hang on concurrent ALTER TABLE". @ mysql-test/t/lock_sync.test Added test case for bug #45143 "All connections hang on concurrent ALTER TABLE". @ mysys/thr_lock.c Ensured that thread can acquire write lock on the table without waiting if it already has write lock on it even if there are other threads holding write locks on this table (this is normal situation for, e.g., TL_WRITE_ALLOW_WRITE type of lock). Adjusted comments to better explain why it is OK to do so and added asserts to prevent introduction of scenarios in which this can cause problems.
[15 Sep 2009 8:33]
Dmitry Lenev
During developing fix for this bug it became clear that not only scenarios including ALTER are affected. More precisely set of affected cases can be described as "concurrent execution of statements which require non-table-level write locks on several instances of the same table (such as SELECT ... FOR UPDATE which uses same InnoDB table twice or a DML statement which invokes trigger which tries to update same InnoDB table directly and through stored function) and statements which required table-level locks on this table (e.g. LOCK TABLE ... WRITE,ALTER TABLE, ...)". Given that this bug also affects 5.1 I am asking for its re-triage.
[26 Oct 2009 19:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/88218 2921 Dmitry Lenev 2009-10-26 Fix for bug #45143 "All connections hang on concurrent ALTER TABLE". Concurrent execution of statements which require non-table-level write locks on several instances of the same table (such as SELECT ... FOR UPDATE which uses same InnoDB table twice or a DML statement which invokes trigger which tries to update same InnoDB table directly and through stored function) and statements which required table-level locks on this table (e.g. LOCK TABLE ... WRITE, ALTER TABLE, ...) might have resulted in a deadlock. The problem occured when a thread tried to acquire write lock (TL_WRITE_ALLOW_WRITE) on the table but had to wait since there was a pending write lock (TL_WRITE, TL_WRITE_ALLOW_READ) on this table and we failed to detect that this thread already had another instance of write lock on it (so in fact we were trying to acquire recursive lock) because there was also another thread holding write lock on the table (also TL_WRITE_ALLOW_WRITE). When the latter thread released its lock neither the first thread nor the thread trying to acquire TL_WRITE/TL_WRITE_ALLOW_READ were woken up (as table was still write locked by the first thread) so we ended up with a deadlock. This patch solves this problem by ensuring that thread which already has write lock on the table won't wait when it tries to acquire second write lock on the same table. @ mysql-test/r/lock_sync.result Added test case for bug #45143 "All connections hang on concurrent ALTER TABLE". @ mysql-test/t/lock_sync.test Added test case for bug #45143 "All connections hang on concurrent ALTER TABLE". @ mysys/thr_lock.c Ensured that thread can acquire write lock on the table without waiting if it already has write lock on it even if there are other threads holding write locks on this table (this is normal situation for, e.g., TL_WRITE_ALLOW_WRITE type of lock). Adjusted comments to better explain why it is OK to do so and added asserts to prevent introduction of scenarios in which this can cause problems.
[26 Oct 2009 19:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/88219 2921 Dmitry Lenev 2009-10-26 Fix for bug #45143 "All connections hang on concurrent ALTER TABLE". Concurrent execution of statements which require non-table-level write locks on several instances of the same table (such as SELECT ... FOR UPDATE which uses same InnoDB table twice or a DML statement which invokes trigger which tries to update same InnoDB table directly and through stored function) and statements which required table-level locks on this table (e.g. LOCK TABLE ... WRITE, ALTER TABLE, ...) might have resulted in a deadlock. The problem occured when a thread tried to acquire write lock (TL_WRITE_ALLOW_WRITE) on the table but had to wait since there was a pending write lock (TL_WRITE, TL_WRITE_ALLOW_READ) on this table and we failed to detect that this thread already had another instance of write lock on it (so in fact we were trying to acquire recursive lock) because there was also another thread holding write lock on the table (also TL_WRITE_ALLOW_WRITE). When the latter thread released its lock neither the first thread nor the thread trying to acquire TL_WRITE/TL_WRITE_ALLOW_READ were woken up (as table was still write locked by the first thread) so we ended up with a deadlock. This patch solves this problem by ensuring that thread which already has write lock on the table won't wait when it tries to acquire second write lock on the same table. @ mysql-test/r/lock_sync.result Added test case for bug #45143 "All connections hang on concurrent ALTER TABLE". @ mysql-test/t/lock_sync.test Added test case for bug #45143 "All connections hang on concurrent ALTER TABLE". @ mysys/thr_lock.c Ensured that thread can acquire write lock on the table without waiting if it already has write lock on it even if there are other threads holding write locks on this table (this is normal situation for, e.g., TL_WRITE_ALLOW_WRITE type of lock). Adjusted comments to better explain why it is OK to do so and added asserts to prevent introduction of scenarios in which this can cause problems.
[26 Oct 2009 19:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/88220 3674 Dmitry Lenev 2009-10-26 [merge] Manual merge of fix for bug #45143 "All connections hang on concurrent ALTER TABLE" into mysql-6.0-codebase based tree.
[27 Oct 2009 6:15]
Dmitry Lenev
Fix for this bug was queued into mysql-next-mr-bugfixing and mysql-6.0-codebase-bugfixing trees.
[31 Oct 2009 8:19]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091031081410-qkxmjsdzjmj840aq) (version source revid:dlenev@mysql.com-20091026195437-86wtt274rxi662ys) (merge vers: 6.0.14-alpha) (pib:13)
[2 Nov 2009 21:04]
Paul DuBois
Noted in 6.0.14 changelog. Concurrent execution of statements requiring a table-level lock and statements requiring a non-table-level write lock for a table could deadlock. Setting report to NDI pending push to 5.5.x.
[12 Nov 2009 8:20]
Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:mikael@mysql.com-20091102100915-a2nbfxaqprpgptfw) (merge vers: 5.5.0-beta) (pib:13)
[12 Nov 2009 19:50]
Paul DuBois
Noted in 5.5.0 changelog.