Bug #38748 Deadlock in Falcon when running simple SELECT queries
Submitted: 12 Aug 2008 11:26 Modified: 9 Jan 2009 13:58
Reporter: Olav Sandstå Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0.6 OS:Solaris (Solaris 10 on x64)
Assigned to: Olav Sandstå CPU Architecture:Any
Tags: F_TRANSACTION
Triage: Triaged: D1 (Critical)

[12 Aug 2008 11:26] Olav Sandstå
Description:
During work on bug #35932 I have several times observed that Falcon blocks due to a deadlock. 

The deadlock occurs when running 30 clients each repeatedly running
SELECT statements for a "random" record from a single table.

After starting MySQL with the --falcon-debug-mask=2 I get the following written to the error file:

Stalled threads
  Thread 90dc080 (11) sleep=0, grant=0, locks=1, who 0, parent=90c4170
    Pending Transaction::commit(3) state 0 (1) syncObject 90cb24c
Stalled synchronization objects:
  SyncObject 90cb24c: state 29, readers 0, monitor 0, waiters 1
    Waiting thread 90dc080 (11), type 1; Transaction::commit(3)
------------------------------------
Stalled threads
  Thread 9184208 (34) sleep=0, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 90dc080 (11) sleep=1, grant=0, locks=1, who 0, parent=90c4170
    Pending Transaction::commit(3) state 0 (1) syncObject 90cb24c
  Thread 9184a18 (42) sleep=0, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 91879e0 (35) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9182de8 (49) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 918a518 (39) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9189638 (25) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9188808 (24) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9189e48 (46) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9185050 (47) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9184768 (36) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 91826f8 (50) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9189388 (33) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9183538 (29) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9185860 (27) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 91844b8 (28) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 91824a8 (51) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9189b98 (26) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9182b98 (40) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9185300 (31) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9183038 (38) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 91888c8 (30) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9182948 (44) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9188e28 (43) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 91890d8 (45) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9183288 (53) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 91898e8 (48) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 91855b0 (52) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
  Thread 9184da0 (41) sleep=1, grant=0, locks=1, who 0, parent=0
    Pending TransactionManager::findTransaction(2) state 0 (2) syncObject 90cb2b8
Stalled synchronization objects:
  SyncObject 90cb2b8: state -1, readers 0, monitor 0, waiters 28
    Exclusive thread 90dc080 (11), type 1; Transaction::commit(3)
    Waiting thread 9184a18 (42), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9184208 (34), type 2; TransactionManager::findTransaction(2)
    Waiting thread 91879e0 (35), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9182de8 (49), type 2; TransactionManager::findTransaction(2)
    Waiting thread 918a518 (39), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9189638 (25), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9188808 (24), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9189e48 (46), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9185050 (47), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9184768 (36), type 2; TransactionManager::findTransaction(2)
    Waiting thread 91826f8 (50), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9189388 (33), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9183538 (29), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9185860 (27), type 2; TransactionManager::findTransaction(2)
    Waiting thread 91844b8 (28), type 2; TransactionManager::findTransaction(2)
    Waiting thread 91824a8 (51), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9189b98 (26), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9182b98 (40), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9185300 (31), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9183038 (38), type 2; TransactionManager::findTransaction(2)
    Waiting thread 91888c8 (30), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9182948 (44), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9188e28 (43), type 2; TransactionManager::findTransaction(2)
    Waiting thread 91890d8 (45), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9183288 (53), type 2; TransactionManager::findTransaction(2)
    Waiting thread 91898e8 (48), type 2; TransactionManager::findTransaction(2)
    Waiting thread 91855b0 (52), type 2; TransactionManager::findTransaction(2)
    Waiting thread 9184da0 (41), type 2; TransactionManager::findTransaction(2)
  SyncObject 90cb24c: state 28, readers 0, monitor 0, waiters 1
    Waiting thread 90dc080 (11), type 1; Transaction::commit(3)

How to repeat:
I use the same client as I use for reproducing bug #35932. This client
run with 30 concurrent client threads that all do single select
statement for a "random" record from a single table.
[12 Aug 2008 12:20] Olav Sandstå
Using pstack and attaching with a debugger shows that:

1 thread (the scavenger thread) is block with the following call stack:

current thread: t@11
  [1] __lwp_park(0x0, 0x0), at 0xfec23dcb 
  [2] cond_sleep_queue(0x90dc090, 0x90dc0a0, 0x0), at 0xfec1e4fc 
  [3] cond_wait_queue(0x90dc090, 0x90dc0a0, 0x0, 0x0), at 0xfec1e60e 
  [4] _cond_wait(0x90dc090, 0x90dc0a0), at 0xfec1eb07 
  [5] cond_wait(0x90dc090, 0x90dc0a0), at 0xfec1eb49 
  [6] _pthread_cond_wait(0x90dc090, 0x90dc0a0), at 0xfec1eb82 
=>[7] Synchronize::sleep(this = 0x90dc080), line 123 in "Synchronize.cpp"
  [8] SyncObject::wait(this = 0x90cb24c, type = Exclusive, thread = 0x90dc080, sync = 0xfe1d6d98, timeout = 0), line 661 in "SyncObject.cpp"
  [9] SyncObject::lock(this = 0x90cb24c, sync = 0xfe1d6d98, type = Exclusive, timeout = 0), line 438 in "SyncObject.cpp"
  [10] Sync::lock(this = 0xfe1d6d98, type = Exclusive), line 58 in "Sync.cpp"
  [11] Transaction::commit(this = 0x913a0c0), line 297 in "Transaction.cpp"
  [12] Connection::commit(this = 0x90c51c0), line 272 in "Connection.cpp"
  [13] Database::commitSystemTransaction(this = 0x8ec41f0), line 1255 in "Database.cpp"
  [14] Database::updateCardinalities(this = 0x8ec41f0), line 2394 in "Database.cpp"
  [15] Database::scavenge(this = 0x8ec41f0), line 1702 in "Database.cpp"
  [16] Scavenger::scavenge(this = 0x90cac68), line 58 in "Scavenger.cpp"
  [17] Scavenger::execute(this = 0x90cac68, scheduler = 0x90cabd0), line 68 in "Scavenger.cpp"
  [18] Scheduler::schedule(this = 0x90cabd0), line 137 in "Scheduler.cpp"
  [19] Scheduler::schedule(lpParameter = 0x90cabd0), line 159 in "Scheduler.cpp"
  [20] Thread::thread(this = 0x90dc080), line 167 in "Thread.cpp"
  [21] Thread::thread(parameter = 0x90dc080), line 146 in "Thread.cpp"
  [22] _thr_setup(0xfe9d4a00), at 0xfec23a81 
  [23] _lwp_start(0x0, 0x0, 0x0, 0x90dc0a0, 0xfe9d4a00, 0xfec4d000), at 0xfec23d70 

29 threads are block with the following call stack:

  [1] __lwp_park(0x0, 0x0), at 0xfec23dcb 
  [2] cond_sleep_queue(0x9188818, 0x9188828, 0x0), at 0xfec1e4fc 
  [3] cond_wait_queue(0x9188818, 0x9188828, 0x0, 0x0), at 0xfec1e60e 
  [4] _cond_wait(0x9188818, 0x9188828), at 0xfec1eb07 
  [5] cond_wait(0x9188818, 0x9188828), at 0xfec1eb49 
  [6] _pthread_cond_wait(0x9188818, 0x9188828), at 0xfec1eb82 
=>[7] Synchronize::sleep(this = 0x9188808), line 123 in "Synchronize.cpp"
  [8] SyncObject::wait(this = 0x90cb2b8, type = Shared, thread = 0x9188808, sync = 0xfd87ba3c, timeout = 0), line 661 in "SyncObject.cpp"
  [9] SyncObject::lock(this = 0x90cb2b8, sync = 0xfd87ba3c, type = Shared, timeout = 0), line 438 in "SyncObject.cpp"
  [10] Sync::lock(this = 0xfd87ba3c, type = Shared), line 58 in "Sync.cpp"
  [11] TransactionManager::findTransaction(this = 0x90cb1c8, transactionId = 114122U), line 386 in "TransactionManager.cpp"
  [12] Transaction::releaseDependencies(this = 0x9117ce8), line 775 in "Transaction.cpp"
  [13] Transaction::commitNoUpdates(this = 0x9117ce8), line 348 in "Transaction.cpp"
  [14] Transaction::commit(this = 0x9117ce8), line 246 in "Transaction.cpp"
  [15] Connection::commit(this = 0x90e1bb8), line 272 in "Connection.cpp"
  [16] StorageConnection::commit(this = 0x90ca370), line 146 in "StorageConnection.cpp"
  [17] StorageHandler::commit(this = 0x8ec3bd8, mySqlThread = 0x2e99f338), line 268 in "StorageHandler.cpp"
  [18] StorageInterface::commit(hton = 0x92dea78, thd = 0x2e99f338, all = true), line 1157 in "ha_falcon.cpp"
  [19] ha_commit_one_phase(thd = 0x2e99f338, all = true), line 1173 in "handler.cc"
  [20] ha_commit_trans(thd = 0x2e99f338, all = true), line 1142 in "handler.cc"
  [21] end_trans(thd = 0x2e99f338, completion = COMMIT), line 668 in "sql_parse.cc"
  [22] mysql_execute_command(thd = 0x2e99f338), line 3933 in "sql_parse.cc"
  [23] mysql_parse(thd = 0x2e99f338, inBuf = 0x2ebc83f0 "commit", length = 6U, found_semicolon = 0xfd87dd4c), line 5841 in "sql_parse.cc"
  [24] dispatch_command(command = COM_QUERY, thd = 0x2e99f338, packet = 0x2e9ad5a9 "commit", packet_length = 6U), line 1120 in "sql_parse.cc"
  [25] do_command(thd = 0x2e99f338), line 807 in "sql_parse.cc"
  [26] handle_one_connection(arg = 0x2e99f338), line 1153 in "sql_connect.cc"
  [27] _thr_setup(0xfe9da200), at 0xfec23a81 
  [28] _lwp_start(0x0, 0x0, 0x0, 0x9188828, 0xfe9da200, 0xfec4d000), at 0xfec23d70

On interesting observation: I run this test with 30 client threads. 29 of them are blocked but it seems that the last one actually is running and is able to make progress (strange...)
[5 Sep 2008 16:38] Kevin Lewis
Olav,  TransactionManager::findTransaction is trying to lock committedTransactions.syncObject.  But higher up the stack, in commitNoUpdates(), it already has activeTransactions.syncObject locked.

The other thread is in Transaction::commit.  There was a time when Transaction::commit locked these two in the opposite order.  But now that Transaction::commit locks activeTransactions.syncObject first, and then committedTransactions.syncObject, I do not think this deadlock will occur any more.  

The follow-on question is whether  this code is needed in Transaction::releaseDependencies;

      if (transaction->transactionId != state->transactionId)
         {
         Transaction *transaction = database->transactionManager->findTransaction(state->transactionId);
         ASSERT(transaction == NULL);
         }

It calls provides a second chance to find the transaction when transaction recycling hits some unspecified race condition.  I think you should leave it as long as the deadlock does not happen.
[8 Sep 2008 13:04] Olav Sandstå
Kevin, thanks for your comments about the deadlock. I think you are right about that this deadlock will no longer occur. When I reported this issue I got it in once out of every three or four runs but for the last few weeks I have not seen it. 

One comment to your comment about this code in Transaction::releaseDependencies;

      if (transaction->transactionId != state->transactionId)
         {
         Transaction *transaction =
database->transactionManager->findTransaction(state->transactionId);
         ASSERT(transaction == NULL);
         }

I do not agree that this provides a second chance for finding the transaction. Maybe it finds the transaction but given that it stores the transaction pointer in in a very local variable that is not used for anything than potentially asserting it does not provide much.

Given that it no longer causes an assert I have no problem leaving the code in.
[8 Sep 2008 15:09] Kevin Lewis
Woops, you are right.  That is a local variable and good for nothing more than asserting.  Now I do not see the need for this code.
[10 Oct 2008 12:59] Olav Sandstå
Although the deadlock no longer happens we have decided to remove this debug code since it requires setting a lock and potentially might participate in future deadlocks.
[10 Oct 2008 18:43] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56054

2862 Olav Sandstaa	2008-10-10
      Fix for Bug#38748 Deadlock in Falcon when running simple SELECT queries
      
      Remove debug code that locks the transaction manager. This code have participated in deadlocks.
[14 Oct 2008 15:49] Kevin Lewis
The latest patch is OK to push.

It which removes a call to findTransaction() and asserts that something was found is not needed.  if the dependent transaction that we have a pointer to was actually reused and has a different transactionId, we still have a pointer to it which must be set to NULL and that transaction would still have a dependency counter that would need to be decremented.  There is no chance that a different transaction would be tracking this transactionID, so findTransaction si not needed.  The whole if section can be deleted as it is in the patch.
[28 Oct 2008 8:10] Bugs System
Pushed into 6.0.8-alpha  (revid:olav@sun.com-20081010184244-t25j7c3r1r7uy8lk) (version source revid:olav@sun.com-20081015201310-1eprw26cyb9lrq8v) (pib:5)
[9 Jan 2009 13:58] MC Brown
A note has been added to the 6.0.8 changelog: 

Running multiple SELECT operations on the same Falcon table could lead to an assertion within the Transaction::initialize. The same operation could also lead to a deadlock situation on the specified table.