Bug #43344 | falcon_recovery assert "dup->state == recDeleted" in Table::checkUniqueRecordVer | ||
---|---|---|---|
Submitted: | 3 Mar 2009 22:14 | Modified: | 26 May 2010 17:47 |
Reporter: | Olav Sandstå | Email Updates: | |
Status: | Unsupported | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) |
Version: | 6.0.10-alpa | OS: | Any |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
Tags: | F_RECORD TREE |
[3 Mar 2009 22:14]
Olav Sandstå
[13 Mar 2009 18:22]
Kevin Lewis
if (!dup->hasRecord()) { // If the record is a lock record, keep looking for a dup. if (dup->state == recLock) continue; // Next record version. if (dup->state == recRollback) continue; // Next record version. // The record has been deleted. ASSERT(dup->state == recDeleted); If this record is not recLock, recRollback or recDeleted, then what is it?
[6 Apr 2009 10:53]
Philip Stoev
(gdb) print *dup $3 = {_vptr.Record = 0x89ebda8, data = {record = 0x0}, useCount = 1, format = 0xb708eb38, recordNumber = 727, size = 88, generation = 31, highWater = 0, encoding = 204 'л', state = 0 '\0'} In other words dup->state = 0
[16 Apr 2009 16:32]
Kevin Lewis
This assert occurred because of a race condition between hasRecord and chill(). notice the order in which the Record::state and data.record variables are accessed inthe two functions below; inline int hasRecord(bool forceThaw = true) { if (state == recChilled && forceThaw) thaw(); return (data.record != NULL); } bool SRLUpdateRecords::chill(. . . { . . . record->deleteData(); record->state = recChilled; . . . The race is that client#1 checks for (state == recChilled) and finds that (state == 0). Then client#2 calls deleteData() but does not set the state yet. Then client#1 returns false for hasData() without thawing the record. The caller, checkUniqueRecordVersion(), hits the assert in the section of code shown earlier in this bug while the state is still 0 but the record is chilled. The fix is for hasRecord() to return whether data.record existed before it checked for recChilled. If next, (state == recChilled), then hasData() should thaw and return true. The new addition of data.record to the Cyclemanager when it is chilled will add consistency to any thread that reads data.record. But all other uses of hasRecord need to be re-evaluated with the knowledge that it could be chilled immediately after the check if that record is pending by another transaction. If hasrecord is called by the same transaction, then it would be safe since only the thread for that transaction can chill it.
[16 Apr 2009 16:33]
Kevin Lewis
Any reference to hasData in the previous note should be hasRecord.
[8 May 2009 19:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/73693 2702 Kevin Lewis 2009-05-08 Bug#43344 & Bug#44721 This series of changes closes many gaps in which a data buffer can get thawed and then re-chilled almost immediatly which can cause the use of a bad data buffer pointer. For 44344, it makes the act of chilling and thawing records an atomic operation which uses CAS. Each call to thaw returns a valid pointer to the buffer that was thawed so that it can be used during the current cycle as long as the caller has a CycleLock. Record.h, Record.cpp, RecordVersion.h & RecordVersion.cpp; The most significant change is that when a data buffer is chilled, it recieves the value (-1) and this happens atomically. Record::state == recChilled is no longer used. The record.data pointer is now always read once, checked, and returned up the call stack to where it is used. hasRecord() is changed from an inline function to a virtual function for both Record and RecordVersion. hasRecord() and isChilled() are added so that the record can check data.record itself. setEncodedRecord() and thaw() return the pointer to the buffer that they just created and put into data.record so that the caller does not have to rely on that buffer being there later. getAllocatedSize() is added so that the number of bytes in a buffer holding the record can be consistently known. It is used in several places. chillData() is similar to and replaces deleteData() during SRLUpdateRecords::chill isDeleted() is also added so that the common (state == recDeleted) check can be done by the record itself. This is not fully implemented yet because we need to get away from the over-use of these state flags. They are somewhat unreliable on multiple CPUs unless they are set with CAS. getRecordData() is also changed from an inline to a virtual function. findRecordData() is like getRecordData but without the thaw. It is used in three places along the call stack for thawing a record to check if another thread has already thawed that record. This is not likely now because of the use of syncThaw. But syncThaw may no longer be necessary. SRLUpdateRecords.h & SRLUpdateRecords.cpp; thaw() returns the data buffer thawed in addition to the number of bytes reallocated. chill() is not sent the dataLength anymore since it calculates that itself. SerialLog.h, SerialLog.cpp, transaction.h, transaction.cpp; Since the number of bytes chilled and thawed stored in the SerialLog and Transaction were being maintained but not used, I deleted those variables and code used to maintain them. The number of allocated bytes associated with a Transaction is stored in totalRecorddata and is still maintained. transaction.cpp Even though backlogging is not currently active, the choice whether to backlog any particular record needed to be enhanced. Along with replacing hasRecord(false) with isChilled(), it should avoid backlogging record chains that do not start at the base record. That is checked with isSuperceded(). Also, backlogging should occur for deleted records as well as newly chilled records.
[12 May 2009 14:36]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/73828 2702 Kevin Lewis 2009-05-12 Bug#43344 & Bug#44721 Second patch (comprehensive) with review changes This series of changes closes many gaps in which a data buffer can get thawed and then re-chilled almost immediatly which can cause the use of a bad data buffer pointer. For 44344, it makes the act of chilling and thawing records an atomic operation which uses CAS. Each call to thaw returns a valid pointer to the buffer that was thawed so that it can be used during the current cycle as long as the caller has a CycleLock. Record.h, Record.cpp, RecordVersion.h & RecordVersion.cpp; The most significant change is that when a data buffer is chilled, it recieves the value (-1) and this happens atomically. Record::state == recChilled is no longer used. The record.data pointer is now always read once, checked, and returned up the call stack to where it is used. hasRecord() is changed from an inline function to a virtual function for both Record and RecordVersion. hasRecord() and isChilled() are added so that the record can check data.record itself. setEncodedRecord() and thaw() return the pointer to the buffer that they just created and put into data.record so that the caller does not have to rely on that buffer being there later. getEncodedValue() uses the recordData pointer of the caller instead of getting it again from data.record. getAllocatedSize() is added so that the number of bytes in a buffer holding the record can be consistently known. It is used in several places. chillData() is similar to and replaces deleteData() during SRLUpdateRecords::chill isDeleted() is also added so that the common (state == recDeleted) check can be done by the record itself. This is not fully implemented yet because we need to get away from the over-use of these state flags. They are somewhat unreliable on multiple CPUs unless they are set with CAS. getRecordData() is also changed from an inline to a virtual function. findRecordData() is like getRecordData but without the thaw. It is used in three places along the call stack for thawing a record to check if another thread has already thawed that record. This is not likely now because of the use of syncThaw. But syncThaw may no longer be necessary. isAllocated() is an inline function used about 6 times to determine if a pointer is real or not. SRLUpdateRecords.h & SRLUpdateRecords.cpp; thaw() returns the data buffer thawed in addition to the number of bytes reallocated. chill() is not sent the dataLength anymore since it calculates that itself. SerialLog.h, SerialLog.cpp, transaction.h, transaction.cpp; Since the number of bytes chilled and thawed stored in the SerialLog and Transaction were being maintained but not used, I deleted those variables and code used to maintain them. The number of allocated bytes associated with a Transaction is stored in totalRecorddata and is still maintained. transaction.cpp Even though backlogging is not currently active, the choice whether to backlog any particular record needed to be enhanced. Along with replacing hasRecord(false) with isChilled(), it should avoid backlogging record chains that do not start at the base record. That is checked with isSuperceded(). Also, backlogging should occur for deleted records as well as newly chilled records.
[12 May 2009 22:59]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/73882 2704 Kevin Lewis 2009-05-12 Code Cleanup and compile warnings Fix the previous patch for Bug#43344 & Bug#44721 that made a call to getAllocatedSize() from inside a thaw. This caused a recursive loop and a stack overflow.
[26 May 2009 18:05]
Kevin Lewis
The assertion happenned again after the pathes above. Those patches helped to distinguish between a record with no data buffer and a chilled record. The recent crash log is here; [Falcon] Error: assertion (dup->state == recDeleted) failed at line 2589 in file Table.cpp http://pb2.norway.sun.com/web.py?action=archive_download&archive_id=537483&pretty=please Since Table::checkUniqueRecordVersion() can be called for any record along a prior record chain at almost any time, it is possible that valid lock or deleted record versions have other temporary states such as recNoChill, recDeleting & recEndChain. This assertion is not valid and should be deleted. The after "getRelativeState(dup, DO_NOT_WAIT)" and "if (!dup->hasRecord()" should be changed to only check for a deleted record if the relative state is Us or CommittedVisible.
[26 May 2009 18:12]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/74975 2712 Kevin Lewis 2009-05-26 Bug#43344 - This assert can still happen even after the chance of it being a chilled record is eliminated. Since Table::checkUniqueRecordVersion() can be called for any record along a prior record chain at almost any time, it is possible that valid lock or deleted record versions have other temporary states such as recNoChill, recDeleting & recEndChain. So this assertion is not valid and should be deleted. The code after "getRelativeState(dup, DO_NOT_WAIT)" and "if (!dup->hasRecord()" should be changed to only check the state for a deleted record if the relative state is Us or CommittedVisible.
[27 May 2009 20:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/75080 2712 Kevin Lewis 2009-05-27 Bug#43344 - Thsi patch is just like the previous one, except that a bug is fixed. Instead of checking if (dup->state != recDeleted) it correctly checks if (dup->state == recDeleted). The assert is that after hasRecord() returns false, the state is neither recLock, recRollback, nor recDeleted. It is not recChilled anymore since that state is no longer used. Since Table::checkUniqueRecordVersion() can be called for any record along a prior record chain at almost any time, it is possible that valid lock or deleted record versions have other temporary states such as recNoChill, recDeleting & recEndChain. So this assertion is not valid and should be deleted. The code after "getRelativeState(dup, DO_NOT_WAIT)" and "if (!dup->hasRecord()" should be changed to only check the state for a deleted record if the relative state is Us or CommittedVisible.
[3 Jun 2009 1:05]
Kevin Lewis
I have been messing with Table::checkUniqueRecordVersion() again for Bug#43344, which we thought was fixed, but still occurs. The assert is that after hasRecord() returns false, the state is neither recLock, recRollback, nor recDeleted. I suspect that the state is one of the 'temporary' states such as recNoChill or recDeleteing on a deleted record, or recEndChain which I think could happen on both recLock and recDeleted. The previous patch avoids this particular error, but there is a mor pervasive problem that should be addressed. The idea of changing the state of a record temporarily, hiding the intrinsic type of the record, is a bad idea. If multiple threads are going to be changing these states, then they should be done atomically, or else the algorithm of the threads must guarantee that only one thread changes the state at a time. Algorithmic thread synchronization is hard to grasp, much less document, much less prove. But instead of making these state changes atomic, I would like to first separate out information that is not needed in that state variable. For example; recChilled, recDeleted, and recLocked all refer to 'why doesn't this record have a data buffer?' Recently, I moved recChilled out of the state by making Record::data.record == recordIsChilled, which is -1. So I have worked this past few days on a patch that does same with recDeleted and recLocked. In this patch, the only reason for Record::data.record == NULL is that the record data is not yet allocated. And there will be no reason for that when a record is found on the tree or in the prior chain. With this patch, we quit checking the state for recLocked and recDeleted. It uses RecordVersion::isALock() and isDeleted() intead. The temporary states do not change what a RecordVersion object represents. It is easier to prove that the temporary states do not overwrite each other. The code is more deterministic. And many of the other states become useless. (If a state is not checked, why set it?) This patch will be done soon.
[3 Jun 2009 1:37]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/75493 2715 Kevin Lewis 2009-06-02 Bug#43344 - Reduce Falcon's dependency on Record::state since some of them are temporary states that overwrite other states. Record.cpp, Record.h, RecordVersion.cpp & RecordVersion.h - - - - - - - - - - - - - - - Replace states recDeleted and recLock with values for Record::data.record of recordIsALock (-2) and recordIsDeleted(-3). This makes it much more deterministic to tell what kind of record this is. Other states were found to be not needed since they were not being checked. They include recOnDisk, recRollback, recUnlocked, recDeleting,and recPruning. The initial state of zero previously called recData is now called recNormal. A new value is used for data.record when the data buffer has been deleted prior to the object being scavenged. It is called recordDataIsDeleted (-4). New functions are added to determine intrinsic record types and status including isAllocated, isALock(), findRecordData(), hasNoData(), setLock() & setDeleted(). ExchangeData is separated from deleteData() so that it can be used more flexibly. validateData is deleted since it is not used. In several places, deleteData() is used before putting something else into the data buffer. Now, it is rearranged so that the ne data buffer is created first and the two are exhanged atomically. This continues the effort to change data.record atomically, even where it contains the non-pointer values. In the process of analyzing this code, several new ASSERTS are added. These were tested as much as possible, but pushbuild and other DBT2 runs will need to completely verify them. Index.cpp, RecordScavenge.cpp, SRLUpdateRecords.cpp, StorageDatabase.cpp, Table.cpp, Transaction.cpp - - - - - - - - - - - - - - - The functions isALock() and isDeleted() are used everywhere the state was checked for recLocked and recDeleted. Table.cpp - - - - - - - - - - - - - - - One of the two flavors of Table::update() had two sections of code that tried to set 'oldRecord' These were combined into more compact code and comments were added.
[3 Jun 2009 18:16]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/75552 2716 Kevin Lewis 2009-06-03 Bug#43344 Code review fix. The previous patch lost the negative on isALock(). This bug would limit the effectiveness of pruning old versions. In addition, this patch limits the use of Record::state even further by asserting that it is always recNormal before setting it to something, and setting it back to recNormal when that state is completed. It assures that these temporary states do not overlap. And since recChilled overlaps with recInserting, recChilled is not used anymore. Instead recInserting and a new state, recUpdating, are checked to prevent the current record from being chilled.
[4 Jun 2009 12:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/75609 2719 Kevin Lewis 2009-06-04 Bug#43344 - One more cleanup. RecordVersion::hasData() should act like RecordVersion::getRecordData() by testing the recordData returned from thaw() instead of calling isChilled() which tests data.record again. The record can immediately get re-chilled. And that actually happened in a pushbuld test.