Bug #36438 | Falcon crash in Record::poke, line 823 | ||
---|---|---|---|
Submitted: | 30 Apr 2008 18:54 | Modified: | 3 Oct 2008 11:48 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
Version: | 6.0.5 | OS: | Any |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
[30 Apr 2008 18:54]
Philip Stoev
[21 May 2008 14:46]
Kevin Lewis
Is there any reason for releaseDependency to call commit?
[14 Jun 2008 4:54]
Kevin Lewis
There is a goo reason for releaseDependency to call Transaction::commitRecords which calls in RecordVersion::commit. This is done to finally release records from a previously committed transaction. This was not called from Transaction::writeComplete because the dependency existed.
[14 Jun 2008 4:56]
Kevin Lewis
The problem occurs in Record::Poke, which is setting the record to the current generation so that it will stay in memory longer. Record::Poke is being called by RecordVersion::commit which is called by Transaction::commitRecords. The record has previously been committed, however. It is only now being pulled out of the transaction list because there were dependencies on that transaction at the time it was committed. In other words, a list of changed records attached to a transaction will not get taken off until the transaction is not only write complete but there are also no longer any dependencies. This thread is doing a rollback of a different transaction, releasing its dependency on the transaction that holds this record. This record belongs to a table that has been dropped. So the pointer in this record for format->table->database->currentGeneration is bad, probably the format pointer or the table pointer. How did the table get dropped while a transaction exists that contains a record in that table? Database::DropTable looks into the active transaction list AND the committed transaction list. It first calls function hasUncommittedRecords to check if there are any active transactions with records in that table. If not, it then calls transactionManager->dropTable which deletes all records for that table from committed transactions. Database::DropTable ought to serialize access to the table it is dropping. But there does not seem to be anything to prevent a commit of changes to a table from occurring between the time it calls Database::hasUncommittedRecords() and the time it calls transactionManager->dropTable. In order to prove this, I added a stall loop in Database::DropTable right after it calls hasUncommittedRecords(), where it searched the active transaction list. The stall sleeps 1 second, write a message, and does this 5 times. After this stall, DropTable cleans up all records for that table from the committed transaction list. Then I added the same kind of 5 second stall in Transaction::Commit after the transaction is taken out of the active transaction list and before being put onto the committed transaction list. At this point, the transaction is on neither list and cannot be found by DropTable. These two pieces of code should not be running concurrently. But the following interleaved print statements show that they actually can occur concurrently; Client #1; Commit; # committing an insert to table T1 Client #2; Drop Table T1; # issued right after the commit. --- cut from mysqld console --- Transaction::commit after activeTransactions.remove Transaction::commit after activeTransactions.remove Drop Table after hasUncommittedRecords Transaction::commit after activeTransactions.remove Drop Table after hasUncommittedRecords Transaction::commit after activeTransactions.remove Drop Table after hasUncommittedRecords Transaction::commit after activeTransactions.remove Drop Table after hasUncommittedRecords Drop Table after hasUncommittedRecords So it is possible for a DropTable to completely miss a pending transaction as it is being committed and not remove those records, so that they remain in the record cache. Later, when the transaction no longer has dependencies, removing those records will attempt to access the pointer to format->table in order to get to table->database->currentGeneration. And there are other places where Record::format is assumed to be correct after the table has been dropped. I will now try to determine the best way to coordinate Transactions::commit and Database::DropTable.
[16 Jun 2008 15:00]
Kevin Lewis
It seems that the gap in Transaction::commit where a transaction has been taken off of the active transaction list but not yet put onto the committed list is a gap that needs to be closed. The follwoing functions search for a transaction in both lists; -- The following functions are reporting functions and it is no big deal if a transaction is missed. TransactionManager::print TransactionManager::getTransactionInfo TransactionManager::getSummaryInfo <- This already gets an exclusive lock on both before accessing either. TransactionManager::removeTransaction <- Checks the state first, then accesses only one list -- These functions access one list after another. It would be a problem if a transaction is not found because it is between lists when they are searched. TransactionManager::findTransaction TransactionManager::validateDependencies Database::dropTable Each of these last three functions access the list in the same order, active first, then committed. So if Transaction::commit gets both locks first before transferring a transaction from one to another, then these other three functions will be sure to find what it is looking for without the need to lock both first. So I will put together a patch that changes Transaction::commit so that the transfer of a transaction from one list to another is done atomically, hold both locks initially. Interestingly, if the commit list is locked while the transaction is taken off the active list, then the active list can be unlocked before adding the transaction to the commit list. This works because these two lists are always accessed in that same order.
[17 Jun 2008 21:34]
Kevin Lewis
Submitted a patch. My sendmail was not working when I committed this... And the message contains the wrong bug number. ------------------------------------------------------------ revno: 2708 revision-id: klewis@mysql.com-20080617183446-r6ufkyxojk79v3gx committer: Kevin Lewis <klewis@mysql.com> timestamp: Tue 2008-06-17 13:34:46 -0500 message: Bug#36468 - Make sure that Transaction::commit gets a lock on both the active and committed transaction lists before it moves a transaction from on to the other. ------------------------------------------------------------
[22 Aug 2008 18:42]
Kevin Lewis
The fix is in version 6.0.6
[1 Oct 2008 12:06]
Jon Stephens
Need a description of this issue as it would apply to a user of the software. Thanks.
[1 Oct 2008 22:24]
Kevin Lewis
This crash could occur when one client commits a transaction of changes to a table at the same time another client drops that table. If the timing is just right, the Drop Table did not see that transaction. Later, one of the records affected by that transaction is accessed in a way that refers to the now deleted table. A crash occurred accessing the bad memory address.
[3 Oct 2008 11:48]
Jon Stephens
Documented in the 6.0.6 changelog as follows: When one MySQL client application committed a transaction affecting a Falcon table at the same time that another client dropped this table, the DROP TABLE statement did not see that transaction. This led to a situation such that a row affected by the transaction was later accessed in a manner that referred to the deleted table, resulting in a crash of the server.
[3 Oct 2008 11:49]
Jon Stephens
Kevin, Thanks for the clarification.