MySQL Bugs: #37433: Deadlock between open_table, close_open_tables, get_table_share, drop_open

Bug #37433	Deadlock between open_table, close_open_tables, get_table_share, drop_open_table
Submitted:	16 Jun 2008 18:58	Modified:	12 Nov 2009 19:49
Reporter:	Philip Stoev	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Locking	Severity:	S1 (Critical)
Version:	6.0	OS:	Any
Assigned to:	Magne Mæhre	CPU Architecture:	Any

Description:
When executing a query workload containing CREATE ... SELECT and DML operations, the server deadlocks with threads stalled in the functions described in the title of this bug.

How to repeat:
A simplifed test case will be provided. A dump of the threads will be attached.

Output from kill -HUP :

Current locks:
lock: 0x9722750:

lock: 0x9758258:

lock: 0x9757600:

lock: 0x955a2f0: write
write     : 0x95e0e70 (34:5);

lock: 0x9517f48: read
read      : 0x9666a20 (27:1); 0x975f448 (30:1); 0x9591e80 (34:1);

lock: 0x953b048:

lock: 0x9537548: write
write     : 0x95e6c18 (34:5); 0x95c61c0 (34:5);

lock: 0x9536050:

lock: 0x951b3a8:

lock: 0x95104b8:

lock: 0x9505328:

lock: 0x94f5770:

lock: 0x94e7870:

lock: 0x94e36a8:

lock: 0x94dd958:

lock: 0x94d66d0:

Stack dumps for bug 37433

Attachment: bug37433.stacks.txt (text/plain), 32.55 KiB.

Test case for bug 37433

Attachment: bug37433.zip (application/x-zip-compressed, text), 1.58 KiB.

To reproduce, please place the .txt files in mysql-test and the .test files in mysql-test/t . Then run:

$ perl ./mysql-test-run.pl --stress --stress-init-file=bug37433_init.txt \
--stress-test-file=bug37433_run.txt --stress-test-duration=7200 \
--stress-threads=10 --skip-ndb --mysqld=--innodb \
--mysqld=--log-output=file --mysqld=--skip-falcon \
--mysqld=--innodb_lock_wait_timeout=5

You should see a hang longer than 5 seconds shortly after takeoff. It appears that following is required to reproduce the issue:

* CREATE ... SELECT from a partitioned table into a partitioned table
* INSERT into a partitioned table with triggers
* ROLLBACK(?)

I would like to make a very wild guess by saying that the locks for the individual parts of the partitioned table may have been aquired in different order by the different threads.

Setting appropriate subcategory.

This issue also occurs in 5.0.52, OS=Solaris 10.

Thread 1: Create [temporary] Table z SELECT .. FROM q; 
Thread 2: Flush table x;
Thread 3: Select .. From y;

J Jorgenson:

Thank you very much for your comment. If your setup did not involve partitions (only available in 5.1 and later), it is likely a different bug. Can you please open a new bug report with your test case. If you can also get stack backtraces with gdb we will be extremely grateful.

In the partition code there are some cases where the LOCK_open and the
share->LOCK_ha_data mutexes are acquired in the wrong order.  This leads
to a deadlock condition occurring occasionally.

The correct order is that LOCK_open must never be acquired while we
hold LOCK_ha_data.

When doing a multi-row INSERT into a partitioned table with
autoincrement columns (an INSERT..SELECT), we need to ensure that
we get a sequence of autoincrement values without holes, for
the statement replication to work.  To ensure this, LOCK_ha_data
is acquired and held for the duration of the query.

A problem occurs when the table has triggers that operate on
other tables. Opening these tables needs to acquire LOCK_open,
resulting in a potential deadlock situation.

How can we solve this?   Here are a couple of ideas, but I need
some feedback on which path to choose...

Option 1 :  We can materialize the SELECT before doing the INSERT,
thus knowing before-hand how many autoincrement keys we need, and
get them in advance.

Option 2 : I guess we can open ALL tables, including the ones
referenced in triggers before starting to execute the INSERT
statement ?   If I understand this correctly, we then don't need
to acquire the LOCK_open anymore after we start the execution ?

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63096

2783 Magne Mahre	2009-01-13
      Bug #37433  	Deadlock between open_table, close_open_tables, 
                      get_table_share, drop_open_table
      
      In the partition handler code, LOCK_open and share->LOCK_ha_data
      are acquired in the wrong order in certain cases.  When doing a
      multi-row INSERT (i.e a INSERT..SELECT) in a table with auto-
      increment column(s). the increments must be in a monotonically
      continuous increasing sequence (i.e it can't have "holes"). To
      achieve this, a lock is held for the duration of the operation.
      share->LOCK_ha_data was used for this purpose.
      
      Whenever there was a need to open a view _during_ the operation
      (views are not currently pre-opened the way tables are), and
      LOCK_open was grabbed, a deadlock could occur.  share->LOCK_ha_data
      is other places used _while_ holding LOCK_open.
      
      A new mutex was introduced in the HA_DATA_PARTITION structure,
      for exclusive use of the autoincrement data fields, so we don't
      need to overload the use of LOCK_ha_data here.
      
      A module test case has not been supplied, since the problem occurs
      as a result of a race condition, and testing for this condition 
      is thus not deterministic.   Testing for it could be done by
      setting up a test case as described in the bug report.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66883

2729 Magne Mahre	2009-02-19
      Bug #37433  	Deadlock between open_table, close_open_tables, 
                      get_table_share, drop_open_table
            
      In the partition handler code, LOCK_open and share->LOCK_ha_data
      are acquired in the wrong order in certain cases.  When doing a
      multi-row INSERT (i.e a INSERT..SELECT) in a table with auto-
      increment column(s). the increments must be in a monotonically
      continuous increasing sequence (i.e it can't have "holes"). To
      achieve this, a lock is held for the duration of the operation.
      share->LOCK_ha_data was used for this purpose.
            
      Whenever there was a need to open a view _during_ the operation
      (views are not currently pre-opened the way tables are), and
      LOCK_open was grabbed, a deadlock could occur.  share->LOCK_ha_data
      is other places used _while_ holding LOCK_open.
            
      A new mutex was introduced in the HA_DATA_PARTITION structure,
      for exclusive use of the autoincrement data fields, so we don't
      need to overload the use of LOCK_ha_data here.
            
      A module test case has not been supplied, since the problem occurs
      as a result of a race condition, and testing for this condition 
      is thus not deterministic.   Testing for it could be done by
      setting up a test case as described in the bug report.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67740

2734 Konstantin Osipov	2009-02-27
      Bug#37433 Deadlock between open_table, close_open_tables,
      get_table_share, drop_open_table.
      
      Additonal fix.
      Destroy the newly added mutex in ha_partition TABLE_SHARE::ha_data.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67746

2735 Konstantin Osipov	2009-02-27
      Bug#37433 "Deadlock between open_table, close_open_tables, 
      get_table_share, drop_open_table"
      Additional fix: implement and assign the optional destructor.
      (Missed to include a file into the changeset).

Pushed into 6.0.11-alpha (revid:alik@sun.com-20090306190634-6s4zeti557q6stly) (version source revid:kostja@sun.com-20090226214852-vtstc3p31hy3be0k) (merge vers: 6.0.11-alpha) (pib:6)

Noted in 6.0.11 changelog.

A workload consisting of CREATE TABLE ... SELECT and DML operations
could cause deadlock.

Will this be considered for a 5.1.x backport?

I am encountering the trigger flavor of this problem in production and while there are potential workarounds we can do in our code, none are attractive since they involve yielding one of either: partitioning, auto_increment key, replication or the trigger.  The 2nd and 4th options are the least offensive but will require a significant investment on our part.  Thanks for considering this.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/86937

2925 Magne Mahre	2009-10-15
      Bug #37433  	Deadlock between open_table, close_open_tables, 
                      get_table_share, drop_open_table
                  
      In the partition handler code, LOCK_open and share->LOCK_ha_data
      are acquired in the wrong order in certain cases.  When doing a
      multi-row INSERT (i.e a INSERT..SELECT) in a table with auto-
      increment column(s). the increments must be in a monotonically
      continuous increasing sequence (i.e it can't have "holes"). To
      achieve this, a lock is held for the duration of the operation.
      share->LOCK_ha_data was used for this purpose.
                  
      Whenever there was a need to open a view _during_ the operation
      (views are not currently pre-opened the way tables are), and
      LOCK_open was grabbed, a deadlock could occur.  share->LOCK_ha_data
      is other places used _while_ holding LOCK_open.
                  
      A new mutex was introduced in the HA_DATA_PARTITION structure,
      for exclusive use of the autoincrement data fields, so we don't
      need to overload the use of LOCK_ha_data here.
                  
      A module test case has not been supplied, since the problem occurs
      as a result of a race condition, and testing for this condition 
      is thus not deterministic.   Testing for it could be done by
      setting up a test case as described in the bug report.

Backported to 5.5.0

6.0-codebase-bugfixing revid: 2617.23.11 + 2617.23.16 + 2617.23.17

Noted in 5.5.0 changelog.

Pushed into 6.0.14-alpha (revid:alik@sun.com-20091102151658-j9o4wgro47m5v84d) (version source revid:alik@sun.com-20091023064702-2f8jdmny61bdl94u) (merge vers: 6.0.14-alpha) (pib:13)

Already fixed in earlier 6.0.x release.

Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:mikael@mysql.com-20091103113702-p61dlwc6ml6fxg18) (merge vers: 5.5.0-beta) (pib:13)