Bug #42842 | Falcon crash in Transaction::thaw() ; this = 0x0 | ||
---|---|---|---|
Submitted: | 13 Feb 2009 16:28 | Modified: | 15 May 2009 14:50 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) |
Version: | 6.0-falcon-team | OS: | Any |
Assigned to: | Olav Sandstå | CPU Architecture: | Any |
Tags: | F_TRANSACTION |
[13 Feb 2009 16:28]
Philip Stoev
[7 Mar 2009 10:50]
Philip Stoev
To reproduce within 2 min with --falcon-index-chill-threshold=1 : $ perl runall.pl \ --engine=Falcon \ --mysqld=--loose-falcon-lock-wait-timeout=1 \ --mysqld=--log-output=none \ --mysqld=--skip-safemalloc \ --rows=10000 \ --threads=16 \ --mysqld=--falcon-index-chill-threshold=1 \ --varchar-length=2 \ --basedir=/build/bzr/6.0-falcon-team \ --mask=49998 \ --queries=100000000 \ --duration=900 \ --gendata=conf/varchar.zz \ --grammar=conf/varchar.yy
[7 Mar 2009 14:42]
Kevin Lewis
Olav, This is another bug associated with disappearing transactions. We discussed today a solution that involves a TransactionState object that is linked to each record and the real transaction and ha useCounts that keep it around until all the pointers are released. The associated bug is Bug#41357.
[30 Mar 2009 17:13]
Olav Sandstå
This bug seems not to be caused by "disappering transactions" (deleted transaction object). What seems to have happened it that before DeferredIndex::scanIndex() is being called there has been a previous (or concurrent?) call to DeferredIndex::detachTransaction(). This call sets the transaction pointer to NULL. By inspecting a core file from the crash shows that the Transaction still exists and is still in state Active.
[1 Apr 2009 10:41]
Olav Sandstå
What caused this crash to happen is that one thread A is getting access to the DeferredIndex just as another thread B is calling Transaction::releaseDeferredIndexes as part of commit/rollback (all crashes I have seen it is called from rollback). What happens is the following: Thread A is in Index::scanIndex and doing the following: 1. Locks the Index::deferredIndex sync object 2. Searches for deferredIndexes that it need to scan .... Thread B: calls DeferredIndex::detachTransaction() which: 1. Lock the DeferredIndex's sync object 2. sets the DeferredIndex::transaction pointer to NULL 3. releases the lock on the sync object 4. calls Index::detachDeferredIndex() -- which BLOCKs due to thread A have a shared lock on Index::deferredIndexes sync object Thread A continues.... 3. Find the DeferredIndex object and calls: 4. DeferredIndex::scanIndex() 5. Lock the DeferredIndex's sync object 6. Finds out that the index has been chilled (at least I think so) and calls: 7. transaction->thaw() - and crashes due to this now being a null pointer (since it was set to NULL in step B.2 above) To me it seems that this bug is caused by setting the transaction pointer to NULL before we have taken the DeferredIndex index out of the Index - or in another way: we start the "clean-up" of the deferredIndex object while it is still available for other threads to use. The simple solution to this problem seems to be to move step B.2 (transaction = NULL) above to after B.4 (the call to Index::detachDeferredIndex()). With this change the test case in the bug report runs without crashing. I plan to make a patch for this.
[1 Apr 2009 11:18]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/71055 3090 Olav Sandstaa 2009-04-01 Fix to Bug#42842 Falcon crash in Transaction::thaw() ; this = 0x0 Change to DeferredIndex::detachTransaction(): Move the code that sets the transaction pointer to NULL to after the DeferredIndex have been detached from the Index. This will avoid that other threads will be able to scan this DeferredIndex and use a transaction pointer that is NULL. @ storage/falcon/DeferredIndex.cpp In detachTransaction: Move the code that sets the transaction pointer to NULL to after the DeferredIndex have been detached from the Index. This will avoid that other threads will be able to scan this DeferredIndex and use a transaction pointer that is NULL.
[1 Apr 2009 14:39]
Kevin Lewis
This solution should solve the problem because it causes both the scanIndex thread and the detachTransaction thread to first read or set NULL the pointer from Index to DefferredIndex under the same Index::DefferredIndexes.syncObject. Then scanIndex uses the DeferredIndex::transaction pointer while still holding the upper lock. However, another approach would have been to check for the existence of a transaction pointer within Transaction::thaw(). In fact, I still think that is a good safe thing to do. If a deferredIndex does not have a transaction pointer, after getting a lock on its syncObject, then it has been completed and its index data will be found in the regular index pages. No need to search it.
[1 Apr 2009 16:51]
Olav Sandstå
Kevin, thanks for comments and reviewing the patch. I have now pushed this patch. When implementing this I considered both approaches you describe. The reason for selecting the one I implemented was that I felt that this was the "most correct" since it would detach the DeferredIndex from the Index before detaching it from the Transaction - making it unavailable for anyone scanning the index. Still, I agree with you that checking for the transaction pointer in DeferredIndex::scanIndex() will make the code more robust. I will make a follow-up patch for this. Setting status back to in progress.
[1 Apr 2009 17:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/71118 3091 Olav Sandstaa 2009-04-01 Follow-up fix to Bug #42842 Falcon crash in Transaction::thaw() ; this = 0x0 Make DeferredIndex::scanIndex() more robust by adding a check that the transaction is still available before starting the search. If the transaction has been detached there is no reason to search for data in this deferred index. @ storage/falcon/DeferredIndex.cpp Make DeferredIndex::scanIndex() more robust by adding a check that the transaction is still available before starting the search. If the transaction has been detached there is no reason to search for data in this deferred index.
[15 Apr 2009 16:59]
Bugs System
Pushed into 6.0.11-alpha (revid:hky@sun.com-20090415164923-9arx29ye5pzxd4zf) (version source revid:vvaintroub@mysql.com-20090406115634-vn2j0hre0dvnry8r) (merge vers: 6.0.11-alpha) (pib:6)
[15 May 2009 14:50]
MC Brown
Internal/test fix. No changelog entry required.