Bug #37433 | Deadlock between open_table, close_open_tables, get_table_share, drop_open_table | ||
---|---|---|---|
Submitted: | 16 Jun 2008 18:58 | Modified: | 12 Nov 2009 19:49 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S1 (Critical) |
Version: | 6.0 | OS: | Any |
Assigned to: | Magne Mæhre | CPU Architecture: | Any |
[16 Jun 2008 18:58]
Philip Stoev
[16 Jun 2008 18:58]
Philip Stoev
Output from kill -HUP : Current locks: lock: 0x9722750: lock: 0x9758258: lock: 0x9757600: lock: 0x955a2f0: write write : 0x95e0e70 (34:5); lock: 0x9517f48: read read : 0x9666a20 (27:1); 0x975f448 (30:1); 0x9591e80 (34:1); lock: 0x953b048: lock: 0x9537548: write write : 0x95e6c18 (34:5); 0x95c61c0 (34:5); lock: 0x9536050: lock: 0x951b3a8: lock: 0x95104b8: lock: 0x9505328: lock: 0x94f5770: lock: 0x94e7870: lock: 0x94e36a8: lock: 0x94dd958: lock: 0x94d66d0:
[16 Jun 2008 19:00]
Philip Stoev
Stack dumps for bug 37433
Attachment: bug37433.stacks.txt (text/plain), 32.55 KiB.
[16 Jun 2008 20:40]
Philip Stoev
Test case for bug 37433
Attachment: bug37433.zip (application/x-zip-compressed, text), 1.58 KiB.
[16 Jun 2008 20:45]
Philip Stoev
To reproduce, please place the .txt files in mysql-test and the .test files in mysql-test/t . Then run: $ perl ./mysql-test-run.pl --stress --stress-init-file=bug37433_init.txt \ --stress-test-file=bug37433_run.txt --stress-test-duration=7200 \ --stress-threads=10 --skip-ndb --mysqld=--innodb \ --mysqld=--log-output=file --mysqld=--skip-falcon \ --mysqld=--innodb_lock_wait_timeout=5 You should see a hang longer than 5 seconds shortly after takeoff. It appears that following is required to reproduce the issue: * CREATE ... SELECT from a partitioned table into a partitioned table * INSERT into a partitioned table with triggers * ROLLBACK(?) I would like to make a very wild guess by saying that the locks for the individual parts of the partitioned table may have been aquired in different order by the different threads.
[17 Jun 2008 14:35]
Philip Stoev
Setting appropriate subcategory.
[25 Jun 2008 16:53]
J Jorgenson
This issue also occurs in 5.0.52, OS=Solaris 10. Thread 1: Create [temporary] Table z SELECT .. FROM q; Thread 2: Flush table x; Thread 3: Select .. From y;
[25 Jun 2008 17:01]
Philip Stoev
J Jorgenson: Thank you very much for your comment. If your setup did not involve partitions (only available in 5.1 and later), it is likely a different bug. Can you please open a new bug report with your test case. If you can also get stack backtraces with gdb we will be extremely grateful.
[17 Nov 2008 11:31]
Magne Mæhre
In the partition code there are some cases where the LOCK_open and the share->LOCK_ha_data mutexes are acquired in the wrong order. This leads to a deadlock condition occurring occasionally. The correct order is that LOCK_open must never be acquired while we hold LOCK_ha_data. When doing a multi-row INSERT into a partitioned table with autoincrement columns (an INSERT..SELECT), we need to ensure that we get a sequence of autoincrement values without holes, for the statement replication to work. To ensure this, LOCK_ha_data is acquired and held for the duration of the query. A problem occurs when the table has triggers that operate on other tables. Opening these tables needs to acquire LOCK_open, resulting in a potential deadlock situation. How can we solve this? Here are a couple of ideas, but I need some feedback on which path to choose... Option 1 : We can materialize the SELECT before doing the INSERT, thus knowing before-hand how many autoincrement keys we need, and get them in advance. Option 2 : I guess we can open ALL tables, including the ones referenced in triggers before starting to execute the INSERT statement ? If I understand this correctly, we then don't need to acquire the LOCK_open anymore after we start the execution ?
[13 Jan 2009 11:51]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/63096 2783 Magne Mahre 2009-01-13 Bug #37433 Deadlock between open_table, close_open_tables, get_table_share, drop_open_table In the partition handler code, LOCK_open and share->LOCK_ha_data are acquired in the wrong order in certain cases. When doing a multi-row INSERT (i.e a INSERT..SELECT) in a table with auto- increment column(s). the increments must be in a monotonically continuous increasing sequence (i.e it can't have "holes"). To achieve this, a lock is held for the duration of the operation. share->LOCK_ha_data was used for this purpose. Whenever there was a need to open a view _during_ the operation (views are not currently pre-opened the way tables are), and LOCK_open was grabbed, a deadlock could occur. share->LOCK_ha_data is other places used _while_ holding LOCK_open. A new mutex was introduced in the HA_DATA_PARTITION structure, for exclusive use of the autoincrement data fields, so we don't need to overload the use of LOCK_ha_data here. A module test case has not been supplied, since the problem occurs as a result of a race condition, and testing for this condition is thus not deterministic. Testing for it could be done by setting up a test case as described in the bug report.
[19 Feb 2009 11:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/66883 2729 Magne Mahre 2009-02-19 Bug #37433 Deadlock between open_table, close_open_tables, get_table_share, drop_open_table In the partition handler code, LOCK_open and share->LOCK_ha_data are acquired in the wrong order in certain cases. When doing a multi-row INSERT (i.e a INSERT..SELECT) in a table with auto- increment column(s). the increments must be in a monotonically continuous increasing sequence (i.e it can't have "holes"). To achieve this, a lock is held for the duration of the operation. share->LOCK_ha_data was used for this purpose. Whenever there was a need to open a view _during_ the operation (views are not currently pre-opened the way tables are), and LOCK_open was grabbed, a deadlock could occur. share->LOCK_ha_data is other places used _while_ holding LOCK_open. A new mutex was introduced in the HA_DATA_PARTITION structure, for exclusive use of the autoincrement data fields, so we don't need to overload the use of LOCK_ha_data here. A module test case has not been supplied, since the problem occurs as a result of a race condition, and testing for this condition is thus not deterministic. Testing for it could be done by setting up a test case as described in the bug report.
[26 Feb 2009 21:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/67740 2734 Konstantin Osipov 2009-02-27 Bug#37433 Deadlock between open_table, close_open_tables, get_table_share, drop_open_table. Additonal fix. Destroy the newly added mutex in ha_partition TABLE_SHARE::ha_data.
[26 Feb 2009 21:49]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/67746 2735 Konstantin Osipov 2009-02-27 Bug#37433 "Deadlock between open_table, close_open_tables, get_table_share, drop_open_table" Additional fix: implement and assign the optional destructor. (Missed to include a file into the changeset).
[6 Mar 2009 19:19]
Bugs System
Pushed into 6.0.11-alpha (revid:alik@sun.com-20090306190634-6s4zeti557q6stly) (version source revid:kostja@sun.com-20090226214852-vtstc3p31hy3be0k) (merge vers: 6.0.11-alpha) (pib:6)
[7 Mar 2009 0:03]
Paul DuBois
Noted in 6.0.11 changelog. A workload consisting of CREATE TABLE ... SELECT and DML operations could cause deadlock.
[10 Aug 2009 21:19]
Steve Beal
Will this be considered for a 5.1.x backport? I am encountering the trigger flavor of this problem in production and while there are potential workarounds we can do in our code, none are attractive since they involve yielding one of either: partitioning, auto_increment key, replication or the trigger. The 2nd and 4th options are the least offensive but will require a significant investment on our part. Thanks for considering this.
[15 Oct 2009 11:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/86937 2925 Magne Mahre 2009-10-15 Bug #37433 Deadlock between open_table, close_open_tables, get_table_share, drop_open_table In the partition handler code, LOCK_open and share->LOCK_ha_data are acquired in the wrong order in certain cases. When doing a multi-row INSERT (i.e a INSERT..SELECT) in a table with auto- increment column(s). the increments must be in a monotonically continuous increasing sequence (i.e it can't have "holes"). To achieve this, a lock is held for the duration of the operation. share->LOCK_ha_data was used for this purpose. Whenever there was a need to open a view _during_ the operation (views are not currently pre-opened the way tables are), and LOCK_open was grabbed, a deadlock could occur. share->LOCK_ha_data is other places used _while_ holding LOCK_open. A new mutex was introduced in the HA_DATA_PARTITION structure, for exclusive use of the autoincrement data fields, so we don't need to overload the use of LOCK_ha_data here. A module test case has not been supplied, since the problem occurs as a result of a race condition, and testing for this condition is thus not deterministic. Testing for it could be done by setting up a test case as described in the bug report.
[15 Oct 2009 11:11]
Magne Mæhre
Backported to 5.5.0 6.0-codebase-bugfixing revid: 2617.23.11 + 2617.23.16 + 2617.23.17
[15 Oct 2009 19:28]
Paul DuBois
Noted in 5.5.0 changelog.
[3 Nov 2009 7:16]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091102151658-j9o4wgro47m5v84d) (version source revid:alik@sun.com-20091023064702-2f8jdmny61bdl94u) (merge vers: 6.0.14-alpha) (pib:13)
[3 Nov 2009 15:25]
Paul DuBois
Already fixed in earlier 6.0.x release.
[12 Nov 2009 8:18]
Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:mikael@mysql.com-20091103113702-p61dlwc6ml6fxg18) (merge vers: 5.5.0-beta) (pib:13)