| Bug #44721 | falcon_chill_thaw_compare crashes in EncodedDataStream::decode | ||
|---|---|---|---|
| Submitted: | 7 May 2009 14:08 | Modified: | 26 May 2010 17:47 |
| Reporter: | Olav Sandstå | Email Updates: | |
| Status: | Unsupported | Impact on me: | |
| Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) |
| Version: | 6.0.12-alpha | OS: | Any |
| Assigned to: | Kevin Lewis | CPU Architecture: | Any |
| Tags: | F_CHILL THAW | ||
[7 May 2009 14:11]
Olav Sandstå
Note that this crash is very similar to the crash in bug#43146. The only difference in the call stack is that in this case the access to illegal data occurs in EncodedDataStream::decode() instead of in EncodedDataStream::skip().
[7 May 2009 15:18]
Kevin Lewis
I am in the process of making changes to the chill/thaw code so that data.record buffers are atomically identified as being thawed. In addition, becuase a newly thawed record can be chilled again immediately, the functions involved in thawing return the buffer that was thawed up the call stack to where it is needed. This seems to be the conflict going on here. A data buffer is not what it is expected to be.
[8 May 2009 19:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/73693 2702 Kevin Lewis 2009-05-08 Bug#43344 & Bug#44721 This series of changes closes many gaps in which a data buffer can get thawed and then re-chilled almost immediatly which can cause the use of a bad data buffer pointer. For 44344, it makes the act of chilling and thawing records an atomic operation which uses CAS. Each call to thaw returns a valid pointer to the buffer that was thawed so that it can be used during the current cycle as long as the caller has a CycleLock. Record.h, Record.cpp, RecordVersion.h & RecordVersion.cpp; The most significant change is that when a data buffer is chilled, it recieves the value (-1) and this happens atomically. Record::state == recChilled is no longer used. The record.data pointer is now always read once, checked, and returned up the call stack to where it is used. hasRecord() is changed from an inline function to a virtual function for both Record and RecordVersion. hasRecord() and isChilled() are added so that the record can check data.record itself. setEncodedRecord() and thaw() return the pointer to the buffer that they just created and put into data.record so that the caller does not have to rely on that buffer being there later. getAllocatedSize() is added so that the number of bytes in a buffer holding the record can be consistently known. It is used in several places. chillData() is similar to and replaces deleteData() during SRLUpdateRecords::chill isDeleted() is also added so that the common (state == recDeleted) check can be done by the record itself. This is not fully implemented yet because we need to get away from the over-use of these state flags. They are somewhat unreliable on multiple CPUs unless they are set with CAS. getRecordData() is also changed from an inline to a virtual function. findRecordData() is like getRecordData but without the thaw. It is used in three places along the call stack for thawing a record to check if another thread has already thawed that record. This is not likely now because of the use of syncThaw. But syncThaw may no longer be necessary. SRLUpdateRecords.h & SRLUpdateRecords.cpp; thaw() returns the data buffer thawed in addition to the number of bytes reallocated. chill() is not sent the dataLength anymore since it calculates that itself. SerialLog.h, SerialLog.cpp, transaction.h, transaction.cpp; Since the number of bytes chilled and thawed stored in the SerialLog and Transaction were being maintained but not used, I deleted those variables and code used to maintain them. The number of allocated bytes associated with a Transaction is stored in totalRecorddata and is still maintained. transaction.cpp Even though backlogging is not currently active, the choice whether to backlog any particular record needed to be enhanced. Along with replacing hasRecord(false) with isChilled(), it should avoid backlogging record chains that do not start at the base record. That is checked with isSuperceded(). Also, backlogging should occur for deleted records as well as newly chilled records.
[12 May 2009 14:36]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/73828 2702 Kevin Lewis 2009-05-12 Bug#43344 & Bug#44721 Second patch (comprehensive) with review changes This series of changes closes many gaps in which a data buffer can get thawed and then re-chilled almost immediatly which can cause the use of a bad data buffer pointer. For 44344, it makes the act of chilling and thawing records an atomic operation which uses CAS. Each call to thaw returns a valid pointer to the buffer that was thawed so that it can be used during the current cycle as long as the caller has a CycleLock. Record.h, Record.cpp, RecordVersion.h & RecordVersion.cpp; The most significant change is that when a data buffer is chilled, it recieves the value (-1) and this happens atomically. Record::state == recChilled is no longer used. The record.data pointer is now always read once, checked, and returned up the call stack to where it is used. hasRecord() is changed from an inline function to a virtual function for both Record and RecordVersion. hasRecord() and isChilled() are added so that the record can check data.record itself. setEncodedRecord() and thaw() return the pointer to the buffer that they just created and put into data.record so that the caller does not have to rely on that buffer being there later. getEncodedValue() uses the recordData pointer of the caller instead of getting it again from data.record. getAllocatedSize() is added so that the number of bytes in a buffer holding the record can be consistently known. It is used in several places. chillData() is similar to and replaces deleteData() during SRLUpdateRecords::chill isDeleted() is also added so that the common (state == recDeleted) check can be done by the record itself. This is not fully implemented yet because we need to get away from the over-use of these state flags. They are somewhat unreliable on multiple CPUs unless they are set with CAS. getRecordData() is also changed from an inline to a virtual function. findRecordData() is like getRecordData but without the thaw. It is used in three places along the call stack for thawing a record to check if another thread has already thawed that record. This is not likely now because of the use of syncThaw. But syncThaw may no longer be necessary. isAllocated() is an inline function used about 6 times to determine if a pointer is real or not. SRLUpdateRecords.h & SRLUpdateRecords.cpp; thaw() returns the data buffer thawed in addition to the number of bytes reallocated. chill() is not sent the dataLength anymore since it calculates that itself. SerialLog.h, SerialLog.cpp, transaction.h, transaction.cpp; Since the number of bytes chilled and thawed stored in the SerialLog and Transaction were being maintained but not used, I deleted those variables and code used to maintain them. The number of allocated bytes associated with a Transaction is stored in totalRecorddata and is still maintained. transaction.cpp Even though backlogging is not currently active, the choice whether to backlog any particular record needed to be enhanced. Along with replacing hasRecord(false) with isChilled(), it should avoid backlogging record chains that do not start at the base record. That is checked with isSuperceded(). Also, backlogging should occur for deleted records as well as newly chilled records.
[12 May 2009 22:59]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/73882 2704 Kevin Lewis 2009-05-12 Code Cleanup and compile warnings Fix the previous patch for Bug#43344 & Bug#44721 that made a call to getAllocatedSize() from inside a thaw. This caused a recursive loop and a stack overflow.

Description: When running the RQG test falcon_chill_thaw_compare the following crash occured: # 22:19:17 #4 <signal handler called> # 22:19:17 #5 0x085c5a51 in EncodedDataStream::decode ( # 22:19:17 ptr=0x17 <Address 0x17 out of bounds>, value=0xb4a0f3bc, copyFlag=false) # 22:19:17 at EncodedDataStream.cpp:311 # 22:19:17 #6 0x085f8306 in Record::getEncodedValue (this=0xa555f428, fieldId=3, # 22:19:17 value=0xb4a0f3bc) at Record.cpp:671 # 22:19:17 #7 0x085f9157 in Record::getRawValue (this=0xa555f428, fieldId=3, # 22:19:17 value=0xb4a0f3bc) at Record.cpp:376 # 22:19:17 #8 0x085f9571 in Record::getValue (this=0xa555f428, fieldId=3, # 22:19:17 value=0xb4a0f3bc) at Record.cpp:328 # 22:19:17 #9 0x085d44e6 in Index::makeKey (this=0xb6e4cef8, record=0xa555f428, # 22:19:17 key=0xb4a0f774) at Index.cpp:700 # 22:19:17 #10 0x085d4666 in Index::duplicateKey (this=0xb6e4cef8, key=0xb4a12208, # 22:19:17 record=0xa555f428) at Index.cpp:755 # 22:19:17 #11 0x085d4806 in Index::garbageCollect (this=) at Index.cpp:720 # 22:19:17 #12 0x085831ad in Table::garbageCollect (this=0xb7194108, leaving=0xa56ec5c8, # 22:19:17 staying=0xa555f428, transaction=0x0, quiet=false) at Table.cpp:2168 # 22:19:17 #13 0x085fa3f0 in RecordLeaf::pruneRecords (this=0xb6e4dbc0, table=0xb7194108, # 22:19:17 base=0, recordScavenge=0xb4a14ebc) at RecordLeaf.cpp:164 # 22:19:17 #14 0x08666467 in RecordGroup::pruneRecords (this=0xb6e51090, # 22:19:17 table=0xb7194108, base=0, recordScavenge=0xb4a14ebc) at RecordGroup.cpp:122 # 22:19:17 #15 0x0857b0d8 in Table::pruneRecords (this=0xb7194108, # 22:19:17 recordScavenge=0xb4a14ebc) at Table.cpp:1947 # 22:19:17 #16 0x085b0c06 in Database::pruneRecords (this=0xb701d648, # 22:19:17 recordScavenge=0xb4a14ebc) at Database.cpp:1937 # 22:19:17 #17 0x085b0d4a in Database::scavengeRecords (this=0xb701d648, forced=false) # 22:19:17 at Database.cpp:1876 # 22:19:17 #18 0x085b1111 in Database::scavenge (this=0xb701d648, forced=false) # 22:19:17 at Database.cpp:1814 # 22:19:17 #19 0x085b1214 in Database::scavengerThreadMain (this=0xb701d648) # 22:19:17 at Database.cpp:2014 # 22:19:17 #20 0x085b12df in Database::scavengerThreadMain (database=0xb701d648) # 22:19:17 at Database.cpp:2003 # 22:19:17 #21 0x0858a355 in Thread::thread (this=0xb6e24000) at Thread.cpp:167 # 22:19:17 #22 0x0858a565 in Thread::thread (parameter=0xb6e24000) at Thread.cpp:146 # 22:19:17 #23 0x0089a45b in start_thread () from /lib/libpthread.so.0 # 22:19:17 #24 0x007f1c4e in clone () from /lib/libc.so.6 The access for illegal data happened in the following code: const UCHAR* EncodedDataStream::decode(const UCHAR *ptr, Value *value, bool copyFlag) { const UCHAR *p = ptr; UCHAR code = *p++; <<<<==== Ups... How to repeat: Seen once while after running falcon_chill_thaw_compare. Suggested fix: Avoid reading illegal data.