MySQL Bugs: #35692: Running falcon_record_cache_memory

Bug #35692	Running falcon_record_cache_memory_leak2-big.test crashes Falcon
Submitted:	31 Mar 2008 6:02	Modified:	8 Jan 2009 9:59
Reporter:	Hakan Küçükyılmaz	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S3 (Non-critical)
Version:	6.0-falcon, 6.0-falcon-team	OS:	Linux
Assigned to:	Christopher Powers	CPU Architecture:	Any
Tags:	triaged

Description:
Running falcon_record_cache_memory_leak2-big.test crashes Falcon.

How to repeat:
Run falcon_record_cache_memory_leak2-big.test with loop count of 1 million instead of 500000.

./mysql-test-run.pl --force --skip-ndb --gdb --suite=large_tests --big-test --suite-timeout=6360 --testcase-timeout=795 falcon_record_cache_memory_leak2-big

Backtrace
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x42804950 (LWP 12203)]
MemMgr::release (object=0xeeeeeeeeeeeeeeee) at MemMgr.cpp:573
573                     if (block->pool == NULL)
(gdb) bt
#0  MemMgr::release (object=0xeeeeeeeeeeeeeeee) at MemMgr.cpp:573
#1  0x000000000087bdfe in MemMgr::releaseDebug (this=<value optimized out>, 
    object=0xeeeeeeeeeeeeeed6) at MemMgr.cpp:915
#2  0x000000000087be15 in MemMgrRecordDelete (
    record=0xeeeeeeeeeeeeeed6 <Address 0xeeeeeeeeeeeeeed6 out of bounds>)
    at MemMgr.cpp:155
#3  0x000000000088ae8f in Record::deleteData (this=0xeee7018) at Record.cpp:880
#4  0x000000000088aecc in ~Record (this=0xeee7018) at Record.cpp:167
#5  0x000000000088adc1 in Record::release (this=0xeee7018) at Record.cpp:529
#6  0x0000000000835129 in Table::validateUpdate (this=0x2aaaaacf1ec0, 
    recordNumber=1, transactionId=321184) at Table.cpp:3623
#7  0x000000000089b8b7 in Section::updateRecord (this=0x2aaaaacf2770, 
    recordNumber=1, stream=0x42803320, transId=321184, earlyWrite=false)
    at Section.cpp:500
#8  0x000000000085f31f in Dbb::updateRecord (this=0x2aaaaacebc58, 
    sectionId=<value optimized out>, recordId=1, stream=0x42803320, 
    transId=321184, earlyWrite=false) at Dbb.cpp:348
#9  0x00000000008b9609 in SRLUpdateRecords::commit (this=0x42803dc8)
    at SRLUpdateRecords.cpp:369
#10 0x00000000008a6ed2 in SerialLogTransaction::commit (this=0x4d5d948)
    at SerialLogTransaction.cpp:91
#11 0x00000000008a6f5b in SerialLogTransaction::doAction (
    this=0xeeeeeeeeeeeeeeee) at SerialLogTransaction.cpp:157
#12 0x00000000008cde04 in Gopher::gopherThread (this=0x2aaaaacb1158)
    at Gopher.cpp:69
#13 0x00000000008cdf0f in Gopher::gopherThread (arg=0xeeeeeeeeeeeeeeee)
    at Gopher.cpp:37
#14 0x000000000083a230 in Thread::thread (this=0x2aaaaacbbbf8)
    at Thread.cpp:161
#15 0x000000000083a39f in Thread::thread (parameter=0xeeeeeeeeeeeeeeee)
    at Thread.cpp:140
#16 0x00002acf7d28b3f7 in start_thread () from /lib/libpthread.so.0
#17 0x00002acf7e79297d in clone () from /lib/libc.so.6
#18 0x0000000000000000 in ?? ()

(gdb) p block->pool
Cannot access memory at address 0xeeeeeeeeeeeeeed6

Chris, This looks like a regression

This needs to be done for 6.0.5

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45109

ChangeSet@1.2632, 2008-04-09 03:23:51-05:00, cpowers@xeno.mysql.com +13 -0
  Bug#35692, "Running falcon_record_cache_memory_leak2-big.test crashes Falcon"
  
  Serialized acces to prior record versions using an array of syncObjects, Table::syncPriorVersions[].
  
  Elements of syncPriorVersions[] are assigned according to record number.
  
  Methods that require read-only access to prior record versions now acquire
  a shared lock on the corresponding syncPriorVersion[] sync object. Methods
  requiring write access acquire an exclusive lock.
  
  RecordVersion::priorVersion is now accessed strictly through RecordVersion::getPriorVersion().

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45139

ChangeSet@1.2633, 2008-04-09 09:52:11-05:00, klewis@klewis-mysql. +6 -0
  More changes for Bug#35692
  Now that priorVersion is a protected variable, we should have
  a separate routine for clearing it when the caller is going to 
  do the release.
  expungeRecordVersions() does not need to lock syncPrior since in all 
  possible call stacks the exclusive lock is already taken.  
  This causes a single thread deadlock on Linux where we do not use
  recursive mutexes.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45146

ChangeSet@1.2634, 2008-04-09 11:21:20-05:00, klewis@klewis-mysql. +1 -0
  Bug#35692 - Don't use a pointer to a Sync object since it will 
  not get released when it goes out of scope.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45158

ChangeSet@1.2635, 2008-04-09 14:49:19-05:00, cpowers@xeno.mysql.com +1 -0
  Bug#35692, "Running falcon_record_cache_memory_leak2-big.test crashes Falcon"
  
  Removed exclusive lock on Table::syncObject in Table::validateAndInsert(). This
  lock was added as part of the original fix for this bug, but appears to be
  causing deadlocks.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45160

ChangeSet@1.2636, 2008-04-09 15:49:00-05:00, cpowers@xeno.mysql.com +1 -0
  Bug#35692, "Running falcon_record_cache_memory_leak2-big.test crashes Falcon"
  
  Table::validateUpdate() -- reverse lock order of syncPrior and RecordLeaf::syncObject.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45170

ChangeSet@1.2637, 2008-04-10 00:32:33-05:00, cpowers@xeno.mysql.com +2 -0
  Bug#35692, "Running falcon_record_cache_memory_leak2-big crashes Falcon"
  
  Removed deadlock. Table::rollbackRecord() locked syncPrior(Shared) before
  calling Table::insert(), but Table::insert() locks Table::syncObject. The
  Table lock must always be taken before the prior record version lock. 
  
  Now Table::rollbackRecord() holds syncPrior(Shared) just long enough to
  get the prior record and addRef, and unlocks syncPrior before calling ::insert().
  
  RecordVersion::scavenge(): Loop through prior versions with shared lock. Change
  prior version linkage under exclusive lock.

Pushed into 6.0.5-alpha

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45304

ChangeSet@1.2640, 2008-04-11 16:26:15-05:00, klewis@klewis-mysql. +5 -0
  Bug#35692 - Protect fetchVersion with a syncPrior.
  Also, prevent a potential deadlock in deleteRecord
  if syncPrior is help too long.  It is now only held 
  during fetvhVersion.

Background
----------

This bug describes memory corruption in the Falcon memory manager, and is a follow-on to Bug#34778,"Possible memory leak running UPDATEs in tight loop".

The test case generates thousands of prior record versions for three records. The surplus of record versions eventually filled record cache memory, causing an "Memory Exhausted" exception.

After Bug#34778 was fixed, the test case occasionally resulted in a crash in the memory manager. Similar crashes had been reported prior to the fix, including Bug #35768, "Running DBT2 crashes Falcon". These crashes appeared to be caused by pointer references to record data memory that had been prematurely released.

Design Details
--------------

Every record update creates a new record version. The old record version is pushed back on a linked list of prior versions for that record. Prior versions that are not being accessed by another transaction will have a use count of 1 and are candidates for deletion from the record cache.

The record scavenger thread periodically scans the record cache for unused record versions. For performance reasons, the design of the in-memory record tree presumes that prior record chains will not be modified if the base record (the most recent version) has a use count > 1. This design greatly facilitates Falcon's multi-version concurrency control by allowing safe and very fast concurrent scans of prior record version chains.

Another aspect to this design is that a large number of rapid sequential updates to a single record could theoretically cause the the prior record version chain for that record to grow indefinitely. This speed-for-memory tradeoff was a known characteristic of the implementation.

Scavenger Fix
-------------

For Bug#34778, the prior version chains grew into tens of thousands, but the old versions were never scavenged and the record cache size continued to grow. All but just the first few record versions were candidates for deletion, but the scavenger thread ignored them because the base record was still in use and did not meet the scavenging criteria.

The scavenger process was modified to delete prior record versions even for base records that were still in use, ensuring that the record cache size would be predictably reduced during each scavenge cycle, and would remain below the maximum cache size.

Problem Summary
---------------

Prior to the fix for Bug#34778, the scavenger only scanned the prior versions chains of inactive records that were old and ready to be taken out of cache completely.

After the fix, the scavenger also scanned the prior version chains of active records, increasing the likelihood of old record versions being removed from record chains that were being traversed by other threads.

Problem Detail
--------------

There are three types of record scavenging:

1) Scavenging at the end of the chain where old record versions are removed.
2) Scavenging the base record when it is old and no longer needed.
3) Scavenging pending records when there is a newer version within the same transaction.

The scavenger thread does scavenging Type 1 and Type 2. With the recent scavenger fix, it does Type 2 more often, even for active records.

No thread ever traverses the prior record chain beyond the record version that is visible to the oldest active repeatable read transaction, so scavenging these old record versions should never conflict with other threads.

However, Transaction::releaseSavepoint() does Type 3 scavenging, and removes unneeded record versions from pending (i.e. active) records. It can potentially release record versions being examined by other threads because it scavenges record versions that reside between the base record and the visible committed record version.

As the number of threads traversing a record version chain increases, so does the probability that one of those threads will encounter a stale record version pointer. The scavenger fix for Bug#34778 further increases this probability.

Solution Summary
----------------

The solution was to serialize access to prior record versions chains. Though seemingly obvious, it required that a central Falcon design element be changed in very short order. The only alternative was to pull the scavenger fix, but that would only reduce the problem occurrence.

Solution Detail
---------------

The serialization mechanism is simple:

Each table has a fixed-size array of "syncPrior" mutex objects. Each object is assigned according to record number mod array size. Read-only access to prior version chains require a shared lock on the corresponding syncPrior object. Write access to prior version chains require an exclusive lock.

Additionally, prior version pointers are encapsulated such that all access occurs through the access method, RecordVersion::getPriorVersion().

Two primary concerns with this change were the introduction of deadlocks and the impact on performance.

To avoid potential deadlocks, syncPrior locks are always taken after a tree fetch and after table locks. Deadlocks that escaped detection by the Falcon regression were quickly exposed by running DBT2 with 32 clients.

The performance impact of serializing the prior record version chains, as measured by DBT2 NOTPMs, was surprisingly small, even negligible, and fell within the range of statistical noise.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45797

ChangeSet@1.2633, 2008-04-21 21:12:08-05:00, klewis@klewis-mysql. +8 -0
  Bug#35692 - Use syncPrior in RecordVersion::fetchVersion.
  syncPrior is not needed in Table::rollbackRecord, deleteRecord
  nor validateAndInsert.
  Nothing need be returned from RecordVersion::rollback

This change is in the 6.0.5 release.

Sorry, it was pushed AFTER the 6.0.5 release. It is in 6.0.6.

Internal change only. No documentation needed.