Bug #42842 Falcon crash in Transaction::thaw() ; this = 0x0
Submitted: 13 Feb 2009 16:28 Modified: 15 May 2009 14:50
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:6.0-falcon-team OS:Any
Assigned to: Olav Sandstå CPU Architecture:Any
Tags: F_TRANSACTION

[13 Feb 2009 16:28] Philip Stoev
Description:
When executing a random DML workload in 32 threads with --falcon-index-chill-threshold=1 , Falcon crashed as follows:

#2  0x00000000006c15e1 in handle_segfault (sig=11) at mysqld.cc:2685
#3  <signal handler called>
#4  0x00000000009a0bbd in Transaction::thaw (this=0x0,
    deferredIndex=0x7f64d1c4e490) at Transaction.cpp:573
#5  0x00000000009dd548 in DeferredIndex::scanIndex (this=0x7f64d1c4e490,
    lowKey=0x7f64bbb2a510, highKey=0x0, searchFlags=2, bitmap=0x7f64d1bc08a0)
    at DeferredIndex.cpp:722
#6  0x00000000009f518b in Index::scanIndex (this=0x7f64d1bcec60,
    lowKey=0x7f64bbb2a510, highKey=0x0, searchFlags=2,
    transaction=0x7f64bbba9d78, bitmap=0x7f64d1bc08a0) at Index.cpp:471
#7  0x0000000000976c93 in StorageDatabase::indexScan (this=0x7f64d1b99208,
    index=0x7f64d1bcec60, lower=0x7f64bbb2a508, upper=0x0, searchFlags=2,
    storageConnection=0x7f64d1c0a680, bitmap=0x7f64d1bc08a0)
    at StorageDatabase.cpp:782
#8  0x000000000097ef42 in StorageTable::indexScan (this=0x7f64bbb2a4b8,
    indexOrder=0) at StorageTable.cpp:259
#9  0x000000000096d95f in StorageInterface::scanRange (this=0x7f64ccfd5300,
    start_key=0x7f64ccfd53e0, end_key=0x0, eqRange=false) at ha_falcon.cpp:1793
#10 0x000000000096dc6a in StorageInterface::read_range_first (
    this=0x7f64ccfd5300, start_key=0x7f64ccfd53e0, end_key=0x0,
    eq_range_arg=false, sorted=true) at ha_falcon.cpp:1734
#11 0x00000000008126d8 in handler::multi_range_read_next (this=0x7f64ccfd5300,
    range_info=0x7f64aca191e0) at handler.cc:4330
#12 0x000000000096eb0a in StorageInterface::multi_range_read_next (
    this=0x7f64ccfd5300, rangeInfo=0x7f64aca191e0) at ha_falcon.cpp:1965
#13 0x00000000007f0157 in QUICK_RANGE_SELECT::get_next (this=0x3a846f0)
    at opt_range.cc:8558
#14 0x000000000080c02e in rr_quick (info=0x7f64aca194b0) at records.cc:322
#15 0x000000000079465f in mysql_update (thd=0x7f64ccf69c08,
    table_list=0x39ebb00, fields=@0x7f64ccf6bab0, values=@0x7f64ccf6beb8,
    conds=0x39ec928, order_num=0, order=0x0, limit=18446744073709551615,
    handle_duplicates=DUP_ERROR, ignore=false) at sql_update.cc:478
#16 0x00000000006d4dba in mysql_execute_command (thd=0x7f64ccf69c08)
    at sql_parse.cc:3014
#17 0x00000000006da911 in mysql_parse (thd=0x7f64ccf69c08,
    inBuf=0x39eb8e0 "UPDATE `table10_falcon_int_autoinc` SET `int` = `int` + 30, `int_key` = `int_key` - 30 WHERE `int_key` > 7", length=106,
    found_semicolon=0x7f64aca1af00) at sql_parse.cc:5751
#18 0x00000000006db4fc in dispatch_command (command=COM_QUERY,
    thd=0x7f64ccf69c08,
    packet=0x7f64ccf74509 " UPDATE `table10_falcon_int_autoinc` SET `int` = `int` + 30, `int_key` = `int_key` - 30 WHERE `int_key` > 7 ", packet_length=108)
    at sql_parse.cc:1009
#19 0x00000000006dca25 in do_command (thd=0x7f64ccf69c08) at sql_parse.cc:691
#20 0x00000000006ca9f5 in handle_one_connection (arg=0x7f64ccf69c08)
    at sql_connect.cc:1146
#21 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#22 0x000000315a4e627d in clone () from /lib64/libc.so.6

(gdb) list
568             return bytesRestored;
569     }
570
571     void Transaction::thaw(DeferredIndex * deferredIndex)
572     {
573             SerialLogControl control(database->dbb->serialLog);
574             control.updateIndex.thaw(deferredIndex);
575     }
576
577     void Transaction::addRecord(RecordVersion * record)

(gdb) print this
$1 = (Transaction * const) 0x0

This crash was only observed once, however a recent bug #42828 also involves a situation where transaction = 0 .

How to repeat:
The tablespace will be attached shortly.
[7 Mar 2009 10:50] Philip Stoev
To reproduce within 2 min with --falcon-index-chill-threshold=1 :

$ perl runall.pl  \
  --engine=Falcon  \
  --mysqld=--loose-falcon-lock-wait-timeout=1 \
  --mysqld=--log-output=none \
  --mysqld=--skip-safemalloc \
  --rows=10000 \
  --threads=16 \
  --mysqld=--falcon-index-chill-threshold=1 \
  --varchar-length=2 \
  --basedir=/build/bzr/6.0-falcon-team \
  --mask=49998 \
  --queries=100000000 \
  --duration=900 \
  --gendata=conf/varchar.zz \
  --grammar=conf/varchar.yy
[7 Mar 2009 14:42] Kevin Lewis
Olav, This is another bug associated with disappearing transactions.  We discussed today a solution that involves a TransactionState object that is linked to each record and the real transaction and ha useCounts that keep it around until all the pointers are released.  The associated bug is Bug#41357.
[30 Mar 2009 17:13] Olav Sandstå
This bug seems not to be caused by "disappering transactions" (deleted transaction object).

What seems to have happened it that before DeferredIndex::scanIndex() is being called there has been a previous (or concurrent?) call to DeferredIndex::detachTransaction(). This call sets the transaction pointer to NULL. By inspecting a core file from the crash shows that the Transaction still exists and is still in state Active.
[1 Apr 2009 10:41] Olav Sandstå
What caused this crash to happen is that one thread A is getting access to the DeferredIndex just as another thread B is calling Transaction::releaseDeferredIndexes as part of commit/rollback (all crashes I have seen it is called from rollback).

What happens is the following:

Thread A is in Index::scanIndex and doing the following:

  1. Locks the Index::deferredIndex sync object
  2. Searches for deferredIndexes that it need to scan ....

Thread B: calls DeferredIndex::detachTransaction() which:

       1. Lock the DeferredIndex's sync object
       2. sets the DeferredIndex::transaction pointer to NULL
       3. releases the lock on the sync object
       4. calls Index::detachDeferredIndex()  -- which BLOCKs due to thread A have a shared lock on Index::deferredIndexes sync object

Thread A continues....

 3. Find the DeferredIndex object and calls:
 4. DeferredIndex::scanIndex()
 5. Lock the DeferredIndex's sync object
 6. Finds out that the index has been chilled (at least I think so) and calls:
 7. transaction->thaw() - and crashes due to this now being a null pointer (since it was set to NULL in step B.2 above)

To me it seems that this bug is caused by setting the transaction pointer to NULL before we have taken the DeferredIndex index out of the Index - or in another way: we start the "clean-up" of the deferredIndex object while it is still available for other threads to use.

The simple solution to this problem seems to be to move step B.2 (transaction = NULL) above to after B.4 (the call to Index::detachDeferredIndex()). With this change the test case in the bug report runs without crashing. I plan to make a patch for this.
[1 Apr 2009 11:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71055

3090 Olav Sandstaa	2009-04-01
      Fix to Bug#42842 Falcon crash in Transaction::thaw() ; this = 0x0
      
      Change to DeferredIndex::detachTransaction():
      
      Move the code that sets the transaction pointer to NULL to after the
      DeferredIndex have been detached from the Index. This will avoid that
      other threads will be able to scan this DeferredIndex and use a
      transaction pointer that is NULL.
     @ storage/falcon/DeferredIndex.cpp
        In detachTransaction: Move the code that sets the transaction pointer to NULL to after the
        DeferredIndex have been detached from the Index. This will avoid that
        other threads will be able to scan this DeferredIndex and use a
        transaction pointer that is NULL.
[1 Apr 2009 14:39] Kevin Lewis
This solution should solve the problem because it causes both the scanIndex thread and the detachTransaction thread to first read or set NULL the pointer from Index to DefferredIndex under the same Index::DefferredIndexes.syncObject. Then scanIndex uses the DeferredIndex::transaction pointer while still holding the upper lock.  

However, another approach would have been to check for the existence of a transaction pointer within Transaction::thaw().  In fact, I still think that is a good safe thing to do.  If a deferredIndex does not have a transaction pointer, after getting a lock on its syncObject, then it has been completed and its index data will be found in the regular index pages.  No need to search it.
[1 Apr 2009 16:51] Olav Sandstå
Kevin, thanks for comments and reviewing the patch. I have now pushed this patch.

When implementing this I considered both approaches you describe. The reason for selecting the one I implemented was that I felt that this was the "most correct" since it would detach the DeferredIndex from the Index before detaching it from the Transaction - making it unavailable for anyone scanning the index.

Still, I agree with you that checking for the transaction pointer in DeferredIndex::scanIndex() will make the code more robust. I will make a follow-up patch for this.

Setting status back to in progress.
[1 Apr 2009 17:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71118

3091 Olav Sandstaa	2009-04-01
      Follow-up fix to Bug #42842 Falcon crash in Transaction::thaw() ; this = 0x0
      
      Make DeferredIndex::scanIndex() more robust by adding a check that the transaction is still
      available before starting the search. If the transaction has been detached there is no reason to 
      search for data in this deferred index.
     @ storage/falcon/DeferredIndex.cpp
        Make DeferredIndex::scanIndex() more robust by adding a check that the transaction is still
        available before starting the search. If the transaction has been detached there is no reason to 
        search for data in this deferred index.
[15 Apr 2009 16:59] Bugs System
Pushed into 6.0.11-alpha (revid:hky@sun.com-20090415164923-9arx29ye5pzxd4zf) (version source revid:vvaintroub@mysql.com-20090406115634-vn2j0hre0dvnry8r) (merge vers: 6.0.11-alpha) (pib:6)
[15 May 2009 14:50] MC Brown
Internal/test fix. No changelog entry required.