Bug #35692 Running falcon_record_cache_memory_leak2-big.test crashes Falcon
Submitted: 31 Mar 8:02 Modified: 22 Aug 19:48
Reporter: Hakan Kuecuekyilmaz
Status: Documenting
Category:Server: Falcon Severity:S3 (Non-critical)
Version:6.0-falcon, 6.0-falcon-team OS:Linux
Assigned to: Christopher Powers Target Version:6.0.6
Tags: triaged
Triage: D1 (Critical) / R3 (Medium) / E4 (High)

[31 Mar 8:02] Hakan Kuecuekyilmaz
Description:
Running falcon_record_cache_memory_leak2-big.test crashes Falcon.

How to repeat:
Run falcon_record_cache_memory_leak2-big.test with loop count of 1 million instead of
500000.

./mysql-test-run.pl --force --skip-ndb --gdb --suite=large_tests --big-test
--suite-timeout=6360 --testcase-timeout=795 falcon_record_cache_memory_leak2-big

Backtrace
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x42804950 (LWP 12203)]
MemMgr::release (object=0xeeeeeeeeeeeeeeee) at MemMgr.cpp:573
573                     if (block->pool == NULL)
(gdb) bt
#0  MemMgr::release (object=0xeeeeeeeeeeeeeeee) at MemMgr.cpp:573
#1  0x000000000087bdfe in MemMgr::releaseDebug (this=<value optimized out>, 
    object=0xeeeeeeeeeeeeeed6) at MemMgr.cpp:915
#2  0x000000000087be15 in MemMgrRecordDelete (
    record=0xeeeeeeeeeeeeeed6 <Address 0xeeeeeeeeeeeeeed6 out of bounds>)
    at MemMgr.cpp:155
#3  0x000000000088ae8f in Record::deleteData (this=0xeee7018) at Record.cpp:880
#4  0x000000000088aecc in ~Record (this=0xeee7018) at Record.cpp:167
#5  0x000000000088adc1 in Record::release (this=0xeee7018) at Record.cpp:529
#6  0x0000000000835129 in Table::validateUpdate (this=0x2aaaaacf1ec0, 
    recordNumber=1, transactionId=321184) at Table.cpp:3623
#7  0x000000000089b8b7 in Section::updateRecord (this=0x2aaaaacf2770, 
    recordNumber=1, stream=0x42803320, transId=321184, earlyWrite=false)
    at Section.cpp:500
#8  0x000000000085f31f in Dbb::updateRecord (this=0x2aaaaacebc58, 
    sectionId=<value optimized out>, recordId=1, stream=0x42803320, 
    transId=321184, earlyWrite=false) at Dbb.cpp:348
#9  0x00000000008b9609 in SRLUpdateRecords::commit (this=0x42803dc8)
    at SRLUpdateRecords.cpp:369
#10 0x00000000008a6ed2 in SerialLogTransaction::commit (this=0x4d5d948)
    at SerialLogTransaction.cpp:91
#11 0x00000000008a6f5b in SerialLogTransaction::doAction (
    this=0xeeeeeeeeeeeeeeee) at SerialLogTransaction.cpp:157
#12 0x00000000008cde04 in Gopher::gopherThread (this=0x2aaaaacb1158)
    at Gopher.cpp:69
#13 0x00000000008cdf0f in Gopher::gopherThread (arg=0xeeeeeeeeeeeeeeee)
    at Gopher.cpp:37
#14 0x000000000083a230 in Thread::thread (this=0x2aaaaacbbbf8)
    at Thread.cpp:161
#15 0x000000000083a39f in Thread::thread (parameter=0xeeeeeeeeeeeeeeee)
    at Thread.cpp:140
#16 0x00002acf7d28b3f7 in start_thread () from /lib/libpthread.so.0
#17 0x00002acf7e79297d in clone () from /lib/libc.so.6
#18 0x0000000000000000 in ?? ()

(gdb) p block->pool
Cannot access memory at address 0xeeeeeeeeeeeeeed6
[31 Mar 16:48] Kevin Lewis
Chris, This looks like a regression
[2 Apr 20:43] Kevin Lewis
This needs to be done for 6.0.5
[9 Apr 10:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45109

ChangeSet@1.2632, 2008-04-09 03:23:51-05:00, cpowers@xeno.mysql.com +13 -0
  Bug#35692, "Running falcon_record_cache_memory_leak2-big.test crashes Falcon"
  
  Serialized acces to prior record versions using an array of syncObjects,
Table::syncPriorVersions[].
  
  Elements of syncPriorVersions[] are assigned according to record number.
  
  Methods that require read-only access to prior record versions now acquire
  a shared lock on the corresponding syncPriorVersion[] sync object. Methods
  requiring write access acquire an exclusive lock.
  
  RecordVersion::priorVersion is now accessed strictly through
RecordVersion::getPriorVersion().
[9 Apr 16:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45139

ChangeSet@1.2633, 2008-04-09 09:52:11-05:00, klewis@klewis-mysql. +6 -0
  More changes for Bug#35692
  Now that priorVersion is a protected variable, we should have
  a separate routine for clearing it when the caller is going to 
  do the release.
  expungeRecordVersions() does not need to lock syncPrior since in all 
  possible call stacks the exclusive lock is already taken.  
  This causes a single thread deadlock on Linux where we do not use
  recursive mutexes.
[9 Apr 18:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45146

ChangeSet@1.2634, 2008-04-09 11:21:20-05:00, klewis@klewis-mysql. +1 -0
  Bug#35692 - Don't use a pointer to a Sync object since it will 
  not get released when it goes out of scope.
[9 Apr 21:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45158

ChangeSet@1.2635, 2008-04-09 14:49:19-05:00, cpowers@xeno.mysql.com +1 -0
  Bug#35692, "Running falcon_record_cache_memory_leak2-big.test crashes Falcon"
  
  Removed exclusive lock on Table::syncObject in Table::validateAndInsert(). This
  lock was added as part of the original fix for this bug, but appears to be
  causing deadlocks.
[9 Apr 22:51] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45160

ChangeSet@1.2636, 2008-04-09 15:49:00-05:00, cpowers@xeno.mysql.com +1 -0
  Bug#35692, "Running falcon_record_cache_memory_leak2-big.test crashes Falcon"
  
  Table::validateUpdate() -- reverse lock order of syncPrior and RecordLeaf::syncObject.
[10 Apr 7:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45170

ChangeSet@1.2637, 2008-04-10 00:32:33-05:00, cpowers@xeno.mysql.com +2 -0
  Bug#35692, "Running falcon_record_cache_memory_leak2-big crashes Falcon"
  
  Removed deadlock. Table::rollbackRecord() locked syncPrior(Shared) before
  calling Table::insert(), but Table::insert() locks Table::syncObject. The
  Table lock must always be taken before the prior record version lock. 
  
  Now Table::rollbackRecord() holds syncPrior(Shared) just long enough to
  get the prior record and addRef, and unlocks syncPrior before calling ::insert().
  
  RecordVersion::scavenge(): Loop through prior versions with shared lock. Change
  prior version linkage under exclusive lock.
[10 Apr 21:14] Bugs System
Pushed into 6.0.5-alpha
[11 Apr 23:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45304

ChangeSet@1.2640, 2008-04-11 16:26:15-05:00, klewis@klewis-mysql. +5 -0
  Bug#35692 - Protect fetchVersion with a syncPrior.
  Also, prevent a potential deadlock in deleteRecord
  if syncPrior is help too long.  It is now only held 
  during fetvhVersion.
[13 Apr 1:20] Christopher Powers
Background
----------

This bug describes memory corruption in the Falcon memory manager, and is a follow-on to
Bug#34778,"Possible memory leak running UPDATEs in tight loop".

The test case generates thousands of prior record versions for three records. The surplus
of record versions eventually filled record cache memory, causing an "Memory Exhausted"
exception.

After Bug#34778 was fixed, the test case occasionally resulted in a crash in the memory
manager. Similar crashes had been reported prior to the fix, including Bug #35768,
"Running DBT2 crashes Falcon". These crashes appeared to be caused by pointer references
to record data memory that had been prematurely released.

Design Details
--------------

Every record update creates a new record version. The old record version is pushed back on
a linked list of prior versions for that record. Prior versions that are not being
accessed by another transaction will have a use count of 1 and are candidates for deletion
from the record cache.

The record scavenger thread periodically scans the record cache for unused record
versions. For performance reasons, the design of the in-memory record tree presumes that
prior record chains will not be modified if the base record (the most recent version) has
a use count > 1. This design greatly facilitates Falcon's multi-version concurrency
control by allowing safe and very fast concurrent scans of prior record version chains.

Another aspect to this design is that a large number of rapid sequential updates to a
single record could theoretically cause the the prior record version chain for that record
to grow indefinitely. This speed-for-memory tradeoff was a known characteristic of the
implementation.

Scavenger Fix
-------------

For Bug#34778, the prior version chains grew into tens of thousands, but the old versions
were never scavenged and the record cache size continued to grow. All but just the first
few record versions were candidates for deletion, but the scavenger thread ignored them
because the base record was still in use and did not meet the scavenging criteria.

The scavenger process was modified to delete prior record versions even for base records
that were still in use, ensuring that the record cache size would be predictably reduced
during each scavenge cycle, and would remain below the maximum cache size.

Problem Summary
---------------

Prior to the fix for Bug#34778, the scavenger only scanned the prior versions chains of
inactive records that were old and ready to be taken out of cache completely.

After the fix, the scavenger also scanned the prior version chains of active records,
increasing the likelihood of old record versions being removed from record chains that
were being traversed by other threads.

Problem Detail
--------------

There are three types of record scavenging:

1) Scavenging at the end of the chain where old record versions are removed.
2) Scavenging the base record when it is old and no longer needed.
3) Scavenging pending records when there is a newer version within the same transaction.

The scavenger thread does scavenging Type 1 and Type 2. With the recent scavenger fix, it
does Type 2 more often, even for active records. 

No thread ever traverses the prior record chain beyond the record version that is visible
to the oldest active repeatable read transaction, so scavenging these old record versions
should never conflict with other threads.

However, Transaction::releaseSavepoint() does Type 3 scavenging, and removes unneeded
record versions from pending (i.e. active) records. It can potentially release record
versions being examined by other threads because it scavenges record versions that reside
between the base record and the visible committed record version.

As the number of threads traversing a record version chain increases, so does the
probability that one of those threads will encounter a stale record version pointer. The
scavenger fix for Bug#34778 further increases this probability.

Solution Summary
----------------

The solution was to serialize access to prior record versions chains. Though seemingly
obvious, it required that a central Falcon design element be changed in very short order.
The only alternative was to pull the scavenger fix, but that would only reduce the problem
occurrence.

Solution Detail
---------------

The serialization mechanism is simple:

Each table has a fixed-size array of "syncPrior" mutex objects. Each object is assigned
according to record number mod array size. Read-only access to prior version chains
require a shared lock on the corresponding syncPrior object. Write access to prior version
chains require an exclusive lock.

Additionally, prior version pointers are encapsulated such that all access occurs through
the access method, RecordVersion::getPriorVersion().

Two primary concerns with this change were the introduction of deadlocks and the impact on
performance.

To avoid potential deadlocks, syncPrior locks are always taken after a tree fetch and
after table locks. Deadlocks that escaped detection by the Falcon regression were quickly
exposed by running DBT2 with 32 clients. 

The performance impact of serializing the prior record version chains, as measured by DBT2
NOTPMs, was surprisingly small, even negligible, and fell within the range of statistical
noise.
[22 Apr 4:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45797

ChangeSet@1.2633, 2008-04-21 21:12:08-05:00, klewis@klewis-mysql. +8 -0
  Bug#35692 - Use syncPrior in RecordVersion::fetchVersion.
  syncPrior is not needed in Table::rollbackRecord, deleteRecord
  nor validateAndInsert.
  Nothing need be returned from RecordVersion::rollback
[2 Jul 14:34] Kevin Lewis
This change is in the 6.0.5 release.
[2 Jul 15:52] Kevin Lewis
Sorry, it was pushed AFTER the 6.0.5 release. It is in 6.0.6.