| Bug #51134 | Crash in MDL_lock::destroy on a concurrent DDL workload | ||
|---|---|---|---|
| Submitted: | 12 Feb 2010 8:53 | Modified: | 7 Mar 2010 0:59 |
| Reporter: | Philip Stoev | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Locking | Severity: | S3 (Non-critical) |
| Version: | mysql-next-4284 | OS: | Any |
| Assigned to: | Dmitry Lenev | CPU Architecture: | Any |
[12 Feb 2010 9:29]
Philip Stoev
Core and binary: http://mysql-systemqa.s3.amazonaws.com/var-bug51134.zip Source: revision-id: dlenev@mysql.com-20100212070543-fu7ppkftm0g3sen1 date: 2010-02-12 10:05:43 +0300 build-date: 2010-02-12 11:29:49 +0200 revno: 3095 branch-nick: mysql-next-4284
[12 Feb 2010 10:21]
Philip Stoev
Repeatable without HANDLER -- again on RENAME TABLE
[12 Feb 2010 20:29]
Dmitry Lenev
This bug is repeatable with the following "simple" test case: create table t3 (i int primary key); connect (blocker, localhost, root, , ); connect (dml, localhost, root, , ); connect (ddl, localhost, root, , ); --echo # Test for RENAME TABLE --echo # Switching to connection 'blocker' connection blocker; lock table t3 read; --echo # Switching to connection 'ddl' connection ddl; let $ID= `select connection_id()`; --send rename tables t1 to t2, t2 to t3; --echo # Switching to connection 'default' connection default; let $wait_condition= select count(*) = 1 from information_schema.processlist where state = "Waiting for table" and info = "rename tables t1 to t2, t2 to t3"; --source include/wait_condition.inc --replace_result $ID ID eval kill query $ID; --echo # Switching to connection 'ddl' connection ddl; --error ER_QUERY_INTERRUPTED --reap
[14 Feb 2010 21:11]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/100316 3096 Dmitry Lenev 2010-02-15 Fix for bug #51134 "Crash in MDL_lock::destroy on a concurrent DDL workload". When a RENAME TABLE or LOCK TABLE ... WRITE statement which mentioned the same table several times were aborted during the process of acquring metadata locks (due to deadlock which was discovered or because of KILL statement) server might have crashed. When attempt to acquire all locks requested had failed we went through the list of requests and released locks which we have managed to acquire by that moment one by one. Since in the scenario described above list of requests contained duplicates this led to releasing the same ticket twice and a crash as result. This patch solves the problem by employing different approach to releasing locks in case of failure to acquire all locks requested. Now we take a MDL savepoint before starting acquiring locks and simply rollback to it if things go bad. @ mysql-test/r/lock_multi.result Updated test results (see lock_multi.test). @ mysql-test/t/lock_multi.test Added test case for bug #51134 "Crash in MDL_lock::destroy on a concurrent DDL workload". @ sql/mdl.cc MDL_context::acquire_locks(): When attempt to acquire all locks requested has failed do not go through the list of requests and release locks which we have managed to acquire one by one. Since list of requests can contain duplicates such approach may lead to releasing the same ticket twice and a crash as result. Instead use the following approach - take a MDL savepoint before starting acquiring locks and simply rollback to it if things go bad.
[15 Feb 2010 10:24]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/100336 3096 Dmitry Lenev 2010-02-15 Fix for bug #51134 "Crash in MDL_lock::destroy on a concurrent DDL workload". When a RENAME TABLE or LOCK TABLE ... WRITE statement which mentioned the same table several times were aborted during the process of acquring metadata locks (due to deadlock which was discovered or because of KILL statement) server might have crashed. When attempt to acquire all locks requested had failed we went through the list of requests and released locks which we have managed to acquire by that moment one by one. Since in the scenario described above list of requests contained duplicates this led to releasing the same ticket twice and a crash as result. This patch solves the problem by employing different approach to releasing locks in case of failure to acquire all locks requested. Now we take a MDL savepoint before starting acquiring locks and simply rollback to it if things go bad. @ mysql-test/r/lock_multi.result Updated test results (see lock_multi.test). @ mysql-test/t/lock_multi.test Added test case for bug #51134 "Crash in MDL_lock::destroy on a concurrent DDL workload". @ sql/mdl.cc MDL_context::acquire_locks(): When attempt to acquire all locks requested has failed do not go through the list of requests and release locks which we have managed to acquire one by one. Since list of requests can contain duplicates such approach may lead to releasing the same ticket twice and a crash as result. Instead use the following approach - take a MDL savepoint before starting acquiring locks and simply rollback to it if things go bad.
[15 Feb 2010 13:59]
Dmitry Lenev
Fix for this bug was pushed into mysql-next-4284 tree. Since this problem is not repeatable outside of this non-public tree there is nothing to document. So I am simply closing this bug report.
[16 Feb 2010 16:47]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100216101445-2ofzkh48aq2e0e8o) (version source revid:alik@sun.com-20100215140849-b9fal65nwvrzczh4) (merge vers: 6.0.14-alpha) (pib:16)
[16 Feb 2010 16:56]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100216101208-33qkfwdr0tep3pf2) (version source revid:alik@sun.com-20100215120405-o1osx2k0nme27tx9) (pib:16)
[6 Mar 2010 11:02]
Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20100216221947-luyhph0txl2c5tc8) (merge vers: 5.5.99-m3) (pib:16)
[7 Mar 2010 0:59]
Paul DuBois
No changelog entry needed.

Description: When executing the full RQG DDL workload including HANDLER, mysqld crashed as follows: #2 0x0000000000642639 in handle_segfault (sig=11) at mysqld.cc:2727 #3 <signal handler called> #4 0x0000000002bbf4c0 in ?? () #5 0x000000000086eb2b in MDL_lock::destroy (lock=0x2bbf4d0) at mdl.cc:690 #6 0x000000000086d439 in MDL_map::remove (this=0x1069f00, lock=0x2bbf4d0) at mdl.cc:531 #7 0x000000000086d490 in MDL_lock::remove_ticket (this=0x2bbf4d0, list=&MDL_lock::m_granted, ticket=0x2bb8140) at mdl.cc:1064 #8 0x000000000086d5cf in MDL_context::release_lock (this=0x7fb14c09a848, ticket=0x2bb8140) at mdl.cc:1924 #9 0x000000000086e580 in MDL_context::acquire_locks (this=0x7fb14c09a848, mdl_requests=0x7fb152858fc0, lock_wait_timeout=1) at mdl.cc:1572 #10 0x0000000000637501 in lock_table_names (thd=0x7fb14c09a778, table_list=0x2be2db0) at lock.cc:987 #11 0x00000000007c465d in mysql_rename_tables (thd=0x7fb14c09a778, table_list=0x2be2db0, silent=false) at sql_rename.cc:137 #12 0x0000000000656ac0 in mysql_execute_command (thd=0x7fb14c09a778) at sql_parse.cc:2684 #13 0x000000000065c862 in mysql_parse (thd=0x7fb14c09a778, inBuf=0x2be2bb8 "RENAME TABLE testdb_N . t1_merge1_N TO testdb_N . t1_merge1_N , testdb_N . t1_temp1_N TO testdb_S . t1_temp1_N", length=113, found_semicolon=0x7fb15285aee0) at sql_parse.cc:5581 #14 0x000000000065d47b in dispatch_command (command=COM_QUERY, thd=0x7fb14c09a778, packet=0x7fb14c0c1b99 "RENAME TABLE testdb_N . t1_merge1_N TO testdb_N . t1_merge1_N , testdb_N . t1_temp1_N TO testdb_S . t1_temp1_N ", packet_length=114) at sql_parse.cc:1023 #15 0x000000000065e91b in do_command (thd=0x7fb14c09a778) at sql_parse.cc:709 #16 0x000000000064cafb in do_handle_one_connection (thd_arg=0x7fb14c09a778) at sql_connect.cc:1174 #17 0x000000000064cbca in handle_one_connection (arg=0x7fb14c09a778) at sql_connect.cc:1113 #18 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0 #19 0x000000315a4e627d in clone () from /lib64/libc.so.6 How to repeat: If this is repeatable, a simplified test case will be provided. In the meantime, the core and the binary will be made available.