Bug #37080 Falcon deadlock on concurrent insert and truncate
Submitted: 29 May 2008 23:59 Modified: 8 Jan 11:32
Reporter: Vladislav Vaintroub
Status: Closed
Category:Server: Falcon Severity:S3 (Non-critical)
Version:mysql-6.0-falcon-team OS:Any
Assigned to: Vladislav Vaintroub Target Version:6.0.6
Triage: D2 (Serious)

[29 May 2008 23:59] Vladislav Vaintroub
Description:
Got a deadlock while running test on with one client doing insert and other client doing
truncate on the same table.

Th threads in question ("truncate" thread and gopher thread) acquire the same locks
,log::syncSections and Table::syncObject, in different order as shown below and this is
the reason for the deadlock.

Annotated Callstacks

Truncate thread

syncObject::wait
SyncObject::lock
Sync::lock
SRLDropTable::append <-- lock (Exclusive) on log::syncSections, SRLDropTable.cpp ,line
50
Dbb::deleteSection
Table::expunge
Table::truncate
Database::truncateTable	<-- lock (Exclusive) on Table::syncObject, Database.cpp, line
1480
StorageDatabase::truncateTable
StorageTableShare::truncateTable
StorageTable::truncateTable
StorageInterface::delete_all_rows

Gopher thread

SyncObject::wait
SyncObject::lock
Table::treeFetch  <-- lock (Shared) on Table::syncObject, Table.cpp, line 915
Table::validateUpdate
Section::updateRecord
Dbb::updateRecord
SRLUpdateRecords::commit
SerialLogTransaction::commit <-- lock (Shared) on log::syncSections,
SRLUpdateRecords.cpp, line 346
SerialLogTransaction::doAction
Gopher::gopherThread

How to repeat:
Will attach the testcase shortly

Suggested fix:
It seems to me, that the best option to tell the MySQL Server to  to disallow all
concurrent operation on the table if truncate is running, i.e uncomment the line 
  //  &&  (sql_command != SQLCOM_TRUNCATE)
in StorageInrterface::store_lock

This way we would not need the Exclusive lock on the table, while truncate is running
(Database.cpp, line 1480). 

I also found that TRUNCATE synchronization within Falcon via Table::synObject is not 100%
effective.In fact, test in the related Bug#35991 crashes because of the race condition
(updating thread is trying to get section page that has  just been freed by truncate).
[30 May 2008 0:14] Vladislav Vaintroub
mysql-test-run.pl friendly test case

Attachment: falcon_bug_37080.test (application/octet-stream, text), 1.35 KiB.

[30 May 2008 0:17] Vladislav Vaintroub
To run the test case, places attached file into mysql-test/t directory and run 
perl mysql-test-run.pl falcon_bug_37080

No corresponding result file is provided . I never seen the test completing, it hangs
within couple of second after the start
[4 Jun 2008 20:06] Kevin Lewis
Vlad,  Neither Chris, Ann or I have any conceptual problem with serializing the truncate
operation.  To the best of our memory, the reason it is allowed in Falcon now is that
Chris spent a lot of time fixing a bug to make it happen.  For a short while,
StorageInrterface::store_lock() actually allowed the server to do a table lock on
SQLCOM_TRUNCATE.  But we commented that line again once we thought we had it working,
evidenced by falcon_bug_22173a.  However, looking back at the number of intermittent
failures for falcon_bug_22173a, it is evident that concurrent truncate still has
problems, which you have found yourself.

Please note that if we uncomment the line 
  //  &&  (sql_command != SQLCOM_TRUNCATE)
in StorageInrterface::store_lock
then it will be very likely that falcon_bug_22173a will take too long and timeout in
pushbuild.  It did before.

That said, the deadlock that you found was between a client thread doing
Database::truncateTable and a gopher thread completing an update that had previously been
committed.  Serializing truncateTable may have an effect on the likelihood of if
occurring, but does not exclude this possible conflict.

We need to know these questions;
1) Why is SerialLog::syncSections used?  Is it needed at all?
2) Why is the gopher thread still completing changes in a table that is currently being
dropped with Dbb::deleteSection?  Where is the protection there?
3) If we serialize truncates, (which seems like something that only changes the timing of
the problems you identified), what changes do you propose, specifically?  Can you provide
falcon-private a proposed patch for review?
[6 Jun 2008 13:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47525

2692 Vladislav Vaintroub	2008-06-06
      Bug#37080, Bug#35991
      - Serialize truncate with any other table operation via MySQL server.
      - Remove Falcon own truncate serialization mechanism, that results
      into deadlocks
[11 Jun 2008 20:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47752

2699 Vladislav Vaintroub	2008-06-11
      Bug#37080 - Falcon deadlock on concurrent insert and truncate
      
      Problem: 
      Two threads, one processing TRUNCATE and gopher processing INSERT acquire 
      the same locks, SerlialLog::syncSections and Table::syncObject in different 
      order 
      
      Solution:
      Rearrange locks, so that syncSections is always locked before 
      Table::syncObject. For this, move lock to syncSection from 
      SRLDropTable::append() up the stack, into to Database::dropTable() 
      Database::truncateTable()
[11 Jun 2008 21:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47756

2699 Vladislav Vaintroub	2008-06-11
      Bug#37080 - Falcon deadlock on concurrent insert and truncate
      
      Problem: 
      Two threads, one processing TRUNCATE and gopher processing INSERT acquire 
      the same locks, SerlialLog::syncSections and Table::syncObject in different 
      order 
      
      Solution:
      Rearrange locks, so that syncSections is always locked before 
      Table::syncObject. For this, move lock to syncSection from 
      SRLDropTable::append() up the stack, into to Database::dropTable() 
      Database::truncateTable()
[22 Aug 2008 22:06] Kevin Lewis
This fix is in version 6.0.6
[8 Jan 11:32] MC Brown
A note has been added to the 6.0.6 changelog: 

When performing operations on a table in one client while a different client is
performing a TRUNCATE operation on the same FALCON table a deadlock could be introduced.