Bug #46496 falcon_recovery fails due to deadlock in Falcon's transaction handling
Submitted: 31 Jul 2009 14:03 Modified: 26 May 2010 17:48
Reporter: Olav Sandstå Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:6.0.12-alpha OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any
Tags: F_THREADS

[31 Jul 2009 14:03] Olav Sandstå
Description:
The Random Query Generator test falcon_recovery fails with the following error:

# 16:57:41 10 stalled queries detected, declaring deadlock at DSN dbi:mysql:host=127.0.0.1:port=19306:user=root:database=test.

This indicates that there are multiple queries that do not have progress and the test takes this as an indication that there is a deadlock.

I will attach the call stacks produced by the test at the time of test failure.

Note: This failure started to happen after the fake deadlock produced by the falcon_recovery test was fixed as part of Bug#46400 "RQG falcon_recovery fails due to fake deadlock"

How to repeat:
Run RQG test falcon_recovery using the latest version of Falcon from the mysql-6.0-falcon tree (or the mysql-6.0-falcon-team tree).
[31 Jul 2009 14:04] Olav Sandstå
Stack trace for all threads when deadlock occured

Attachment: callstacks.txt (text/plain), 34.08 KiB.

[3 Aug 2009 12:57] Olav Sandstå
This test failure seems to be caused by a Falcon deadlock. Looking at the attached call stacks it seems like there is potentional deadlock involving at least the following four threads:

1. The Gopher (thread 28) is:
    -waiting to get a shared lock on the active transaction list
    -holds a shared lock on Transaction::syncRecords

2. A user thread running TRUNCATE table (thread 4):
    -waiting for a exclusive lock on the committed transaction list
    -holds a shared lock on the active transaction list

3. A user thread running DROP table (thread 8):
    -waiting from an exclusive lock on Transaction::syncRecords
    -holds a shared lock on the committed transaction list

4. A user thread running Transaction::rollback (thread 2):
    -waiting to get an EXCLUSIVE lock on the active transaction list

This deadlock might be caused by the recently introduced shared lock on the active transaction list in Transaction::thawAll(). Changing the order of aquiring locks in Transaction::thawAll might solve this problem.
[18 Aug 2009 22:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/81025

2766 Kevin Lewis	2009-08-18
      Bug#46496 - There are a number of places in the Falcon code where 
      TransactionManager::committedTransactions.syncObject is held while 
      locking Transaction::syncRecords.    So Transaction::thawAll() should 
      not hold Transaction::syncRecords while locking 
      TransactionManager::activeTransactions.syncObject,
      which is often held while locking committedTransactions.syncObject.  
      It can cause a deadlock.
[24 Aug 2009 13:53] Olav Sandstå
Patch looks correct and solves the problem.