Bug #35692 | Running falcon_record_cache_memory_leak2-big.test crashes Falcon | ||
---|---|---|---|
Submitted: | 31 Mar 2008 6:02 | Modified: | 8 Jan 2009 9:59 |
Reporter: | Hakan Küçükyılmaz | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | 6.0-falcon, 6.0-falcon-team | OS: | Linux |
Assigned to: | Christopher Powers | CPU Architecture: | Any |
Tags: | triaged |
[31 Mar 2008 6:02]
Hakan Küçükyılmaz
[31 Mar 2008 14:48]
Kevin Lewis
Chris, This looks like a regression
[2 Apr 2008 18:43]
Kevin Lewis
This needs to be done for 6.0.5
[9 Apr 2008 8:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/45109 ChangeSet@1.2632, 2008-04-09 03:23:51-05:00, cpowers@xeno.mysql.com +13 -0 Bug#35692, "Running falcon_record_cache_memory_leak2-big.test crashes Falcon" Serialized acces to prior record versions using an array of syncObjects, Table::syncPriorVersions[]. Elements of syncPriorVersions[] are assigned according to record number. Methods that require read-only access to prior record versions now acquire a shared lock on the corresponding syncPriorVersion[] sync object. Methods requiring write access acquire an exclusive lock. RecordVersion::priorVersion is now accessed strictly through RecordVersion::getPriorVersion().
[9 Apr 2008 14:53]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/45139 ChangeSet@1.2633, 2008-04-09 09:52:11-05:00, klewis@klewis-mysql. +6 -0 More changes for Bug#35692 Now that priorVersion is a protected variable, we should have a separate routine for clearing it when the caller is going to do the release. expungeRecordVersions() does not need to lock syncPrior since in all possible call stacks the exclusive lock is already taken. This causes a single thread deadlock on Linux where we do not use recursive mutexes.
[9 Apr 2008 16:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/45146 ChangeSet@1.2634, 2008-04-09 11:21:20-05:00, klewis@klewis-mysql. +1 -0 Bug#35692 - Don't use a pointer to a Sync object since it will not get released when it goes out of scope.
[9 Apr 2008 19:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/45158 ChangeSet@1.2635, 2008-04-09 14:49:19-05:00, cpowers@xeno.mysql.com +1 -0 Bug#35692, "Running falcon_record_cache_memory_leak2-big.test crashes Falcon" Removed exclusive lock on Table::syncObject in Table::validateAndInsert(). This lock was added as part of the original fix for this bug, but appears to be causing deadlocks.
[9 Apr 2008 20:51]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/45160 ChangeSet@1.2636, 2008-04-09 15:49:00-05:00, cpowers@xeno.mysql.com +1 -0 Bug#35692, "Running falcon_record_cache_memory_leak2-big.test crashes Falcon" Table::validateUpdate() -- reverse lock order of syncPrior and RecordLeaf::syncObject.
[10 Apr 2008 5:35]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/45170 ChangeSet@1.2637, 2008-04-10 00:32:33-05:00, cpowers@xeno.mysql.com +2 -0 Bug#35692, "Running falcon_record_cache_memory_leak2-big crashes Falcon" Removed deadlock. Table::rollbackRecord() locked syncPrior(Shared) before calling Table::insert(), but Table::insert() locks Table::syncObject. The Table lock must always be taken before the prior record version lock. Now Table::rollbackRecord() holds syncPrior(Shared) just long enough to get the prior record and addRef, and unlocks syncPrior before calling ::insert(). RecordVersion::scavenge(): Loop through prior versions with shared lock. Change prior version linkage under exclusive lock.
[10 Apr 2008 19:14]
Bugs System
Pushed into 6.0.5-alpha
[11 Apr 2008 21:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/45304 ChangeSet@1.2640, 2008-04-11 16:26:15-05:00, klewis@klewis-mysql. +5 -0 Bug#35692 - Protect fetchVersion with a syncPrior. Also, prevent a potential deadlock in deleteRecord if syncPrior is help too long. It is now only held during fetvhVersion.
[12 Apr 2008 23:20]
Christopher Powers
Background ---------- This bug describes memory corruption in the Falcon memory manager, and is a follow-on to Bug#34778,"Possible memory leak running UPDATEs in tight loop". The test case generates thousands of prior record versions for three records. The surplus of record versions eventually filled record cache memory, causing an "Memory Exhausted" exception. After Bug#34778 was fixed, the test case occasionally resulted in a crash in the memory manager. Similar crashes had been reported prior to the fix, including Bug #35768, "Running DBT2 crashes Falcon". These crashes appeared to be caused by pointer references to record data memory that had been prematurely released. Design Details -------------- Every record update creates a new record version. The old record version is pushed back on a linked list of prior versions for that record. Prior versions that are not being accessed by another transaction will have a use count of 1 and are candidates for deletion from the record cache. The record scavenger thread periodically scans the record cache for unused record versions. For performance reasons, the design of the in-memory record tree presumes that prior record chains will not be modified if the base record (the most recent version) has a use count > 1. This design greatly facilitates Falcon's multi-version concurrency control by allowing safe and very fast concurrent scans of prior record version chains. Another aspect to this design is that a large number of rapid sequential updates to a single record could theoretically cause the the prior record version chain for that record to grow indefinitely. This speed-for-memory tradeoff was a known characteristic of the implementation. Scavenger Fix ------------- For Bug#34778, the prior version chains grew into tens of thousands, but the old versions were never scavenged and the record cache size continued to grow. All but just the first few record versions were candidates for deletion, but the scavenger thread ignored them because the base record was still in use and did not meet the scavenging criteria. The scavenger process was modified to delete prior record versions even for base records that were still in use, ensuring that the record cache size would be predictably reduced during each scavenge cycle, and would remain below the maximum cache size. Problem Summary --------------- Prior to the fix for Bug#34778, the scavenger only scanned the prior versions chains of inactive records that were old and ready to be taken out of cache completely. After the fix, the scavenger also scanned the prior version chains of active records, increasing the likelihood of old record versions being removed from record chains that were being traversed by other threads. Problem Detail -------------- There are three types of record scavenging: 1) Scavenging at the end of the chain where old record versions are removed. 2) Scavenging the base record when it is old and no longer needed. 3) Scavenging pending records when there is a newer version within the same transaction. The scavenger thread does scavenging Type 1 and Type 2. With the recent scavenger fix, it does Type 2 more often, even for active records. No thread ever traverses the prior record chain beyond the record version that is visible to the oldest active repeatable read transaction, so scavenging these old record versions should never conflict with other threads. However, Transaction::releaseSavepoint() does Type 3 scavenging, and removes unneeded record versions from pending (i.e. active) records. It can potentially release record versions being examined by other threads because it scavenges record versions that reside between the base record and the visible committed record version. As the number of threads traversing a record version chain increases, so does the probability that one of those threads will encounter a stale record version pointer. The scavenger fix for Bug#34778 further increases this probability. Solution Summary ---------------- The solution was to serialize access to prior record versions chains. Though seemingly obvious, it required that a central Falcon design element be changed in very short order. The only alternative was to pull the scavenger fix, but that would only reduce the problem occurrence. Solution Detail --------------- The serialization mechanism is simple: Each table has a fixed-size array of "syncPrior" mutex objects. Each object is assigned according to record number mod array size. Read-only access to prior version chains require a shared lock on the corresponding syncPrior object. Write access to prior version chains require an exclusive lock. Additionally, prior version pointers are encapsulated such that all access occurs through the access method, RecordVersion::getPriorVersion(). Two primary concerns with this change were the introduction of deadlocks and the impact on performance. To avoid potential deadlocks, syncPrior locks are always taken after a tree fetch and after table locks. Deadlocks that escaped detection by the Falcon regression were quickly exposed by running DBT2 with 32 clients. The performance impact of serializing the prior record version chains, as measured by DBT2 NOTPMs, was surprisingly small, even negligible, and fell within the range of statistical noise.
[22 Apr 2008 2:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/45797 ChangeSet@1.2633, 2008-04-21 21:12:08-05:00, klewis@klewis-mysql. +8 -0 Bug#35692 - Use syncPrior in RecordVersion::fetchVersion. syncPrior is not needed in Table::rollbackRecord, deleteRecord nor validateAndInsert. Nothing need be returned from RecordVersion::rollback
[2 Jul 2008 12:34]
Kevin Lewis
This change is in the 6.0.5 release.
[2 Jul 2008 13:52]
Kevin Lewis
Sorry, it was pushed AFTER the 6.0.5 release. It is in 6.0.6.
[8 Jan 2009 9:59]
MC Brown
Internal change only. No documentation needed.