Bug #44721 falcon_chill_thaw_compare crashes in EncodedDataStream::decode
Submitted: 7 May 2009 14:08 Modified: 26 May 2010 17:47
Reporter: Olav Sandstå Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S2 (Serious)
Version:6.0.12-alpha OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any
Tags: F_CHILL THAW

[7 May 2009 14:08] Olav Sandstå
Description:
When running the RQG test falcon_chill_thaw_compare the following crash occured:

# 22:19:17 #4  <signal handler called>
# 22:19:17 #5  0x085c5a51 in EncodedDataStream::decode (
# 22:19:17     ptr=0x17 <Address 0x17 out of bounds>, value=0xb4a0f3bc, copyFlag=false)
# 22:19:17     at EncodedDataStream.cpp:311
# 22:19:17 #6  0x085f8306 in Record::getEncodedValue (this=0xa555f428, fieldId=3, 
# 22:19:17     value=0xb4a0f3bc) at Record.cpp:671
# 22:19:17 #7  0x085f9157 in Record::getRawValue (this=0xa555f428, fieldId=3, 
# 22:19:17     value=0xb4a0f3bc) at Record.cpp:376
# 22:19:17 #8  0x085f9571 in Record::getValue (this=0xa555f428, fieldId=3, 
# 22:19:17     value=0xb4a0f3bc) at Record.cpp:328
# 22:19:17 #9  0x085d44e6 in Index::makeKey (this=0xb6e4cef8, record=0xa555f428, 
# 22:19:17     key=0xb4a0f774) at Index.cpp:700
# 22:19:17 #10 0x085d4666 in Index::duplicateKey (this=0xb6e4cef8, key=0xb4a12208, 
# 22:19:17     record=0xa555f428) at Index.cpp:755
# 22:19:17 #11 0x085d4806 in Index::garbageCollect (this=) at Index.cpp:720
# 22:19:17 #12 0x085831ad in Table::garbageCollect (this=0xb7194108, leaving=0xa56ec5c8, 
# 22:19:17     staying=0xa555f428, transaction=0x0, quiet=false) at Table.cpp:2168
# 22:19:17 #13 0x085fa3f0 in RecordLeaf::pruneRecords (this=0xb6e4dbc0, table=0xb7194108, 
# 22:19:17     base=0, recordScavenge=0xb4a14ebc) at RecordLeaf.cpp:164
# 22:19:17 #14 0x08666467 in RecordGroup::pruneRecords (this=0xb6e51090, 
# 22:19:17     table=0xb7194108, base=0, recordScavenge=0xb4a14ebc) at RecordGroup.cpp:122
# 22:19:17 #15 0x0857b0d8 in Table::pruneRecords (this=0xb7194108, 
# 22:19:17     recordScavenge=0xb4a14ebc) at Table.cpp:1947
# 22:19:17 #16 0x085b0c06 in Database::pruneRecords (this=0xb701d648, 
# 22:19:17     recordScavenge=0xb4a14ebc) at Database.cpp:1937
# 22:19:17 #17 0x085b0d4a in Database::scavengeRecords (this=0xb701d648, forced=false)
# 22:19:17     at Database.cpp:1876
# 22:19:17 #18 0x085b1111 in Database::scavenge (this=0xb701d648, forced=false)
# 22:19:17     at Database.cpp:1814
# 22:19:17 #19 0x085b1214 in Database::scavengerThreadMain (this=0xb701d648)
# 22:19:17     at Database.cpp:2014
# 22:19:17 #20 0x085b12df in Database::scavengerThreadMain (database=0xb701d648)
# 22:19:17     at Database.cpp:2003
# 22:19:17 #21 0x0858a355 in Thread::thread (this=0xb6e24000) at Thread.cpp:167
# 22:19:17 #22 0x0858a565 in Thread::thread (parameter=0xb6e24000) at Thread.cpp:146
# 22:19:17 #23 0x0089a45b in start_thread () from /lib/libpthread.so.0
# 22:19:17 #24 0x007f1c4e in clone () from /lib/libc.so.6

The access for illegal data happened in the following code:

const UCHAR* EncodedDataStream::decode(const UCHAR *ptr, Value *value, bool copyFlag)
{
	const UCHAR *p = ptr;
	UCHAR code = *p++;              <<<<==== Ups...

How to repeat:
Seen once while after running falcon_chill_thaw_compare.

Suggested fix:
Avoid reading illegal data.
[7 May 2009 14:11] Olav Sandstå
Note that this crash is very similar to the crash in bug#43146. The only difference in the call stack is that in this case the access to illegal data occurs in EncodedDataStream::decode() instead of in EncodedDataStream::skip().
[7 May 2009 15:18] Kevin Lewis
I am in the process of making changes to the chill/thaw code so that data.record buffers are atomically identified as being thawed.  In addition, becuase a newly thawed record can be chilled again immediately, the functions involved in thawing return the buffer that was thawed up the call stack to where it is needed.

This seems to be the conflict going on here.  A data buffer is not what it is expected to be.
[8 May 2009 19:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/73693

2702 Kevin Lewis	2009-05-08
      Bug#43344 & Bug#44721  
      This series of changes closes many gaps in which a data buffer can get 
      thawed and then re-chilled almost immediatly which can cause the use 
      of a bad data buffer pointer.  For 44344, it makes the act of chilling
      and thawing records an atomic operation which uses CAS.  Each call to 
      thaw returns a valid pointer to the buffer that was thawed so that it 
      can be used during the current cycle as long as the caller has a 
      CycleLock.
      
      Record.h, Record.cpp, RecordVersion.h & RecordVersion.cpp;
      
      The most significant change is that when a data buffer is chilled, 
      it recieves the value (-1) and this happens atomically.  
      Record::state == recChilled is no longer used.  
      The record.data pointer is now always read once, checked, and 
      returned up the call stack to where it is used.
      
      hasRecord() is changed from an inline function to a virtual function 
      for both Record and RecordVersion. hasRecord() and isChilled() are 
      added so that the record can check data.record itself. 
      
      setEncodedRecord() and thaw() return the pointer to the buffer
      that they just created and put into data.record so that the caller 
      does not have to rely on that buffer being there later.
      
      getAllocatedSize() is added so that the number of bytes in a buffer 
      holding the record can be consistently known.  It is used in several 
      places.  
      
      chillData() is similar to and replaces deleteData() during 
      SRLUpdateRecords::chill
      
      isDeleted() is also added so that the common (state == recDeleted) 
      check can be done by the record itself.  This is not fully 
      implemented yet because we need to get away from the over-use of 
      these state flags.  They are somewhat unreliable on multiple CPUs 
      unless they are set with CAS.
      
      getRecordData() is also changed from an inline to a virtual function.
      
      findRecordData() is like getRecordData but without the thaw.  It is 
      used in three places along the call stack for thawing a record to 
      check if another thread has already thawed that record.  This is 
      not likely now because of the use of syncThaw.  But syncThaw may 
      no longer be necessary.
      
      SRLUpdateRecords.h & SRLUpdateRecords.cpp;
      
      thaw() returns the data buffer thawed in addition to the number of 
      bytes reallocated.
      
      chill() is not sent the dataLength anymore since it calculates that 
      itself.
      
      SerialLog.h, SerialLog.cpp, transaction.h, transaction.cpp;
      
      Since the number of bytes chilled and thawed stored in the SerialLog 
      and Transaction were being maintained but not used, I deleted those
      variables and code used to maintain them.  The number of allocated 
      bytes associated with a Transaction is stored in totalRecorddata and 
      is still maintained.
      
      transaction.cpp
      Even though backlogging is not currently active, the choice whether 
      to backlog any particular record needed to be enhanced.  Along with 
      replacing hasRecord(false) with isChilled(), it should avoid 
      backlogging record chains that do not start at the base record.  
      That is checked with isSuperceded(). Also, backlogging should occur 
      for deleted records as well as newly chilled records.
[12 May 2009 14:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/73828

2702 Kevin Lewis	2009-05-12
      Bug#43344 & Bug#44721  
      Second patch (comprehensive) with review changes
      
      This series of changes closes many gaps in which a data buffer can get 
      thawed and then re-chilled almost immediatly which can cause the use 
      of a bad data buffer pointer.  For 44344, it makes the act of chilling
      and thawing records an atomic operation which uses CAS.  Each call to 
      thaw returns a valid pointer to the buffer that was thawed so that it 
      can be used during the current cycle as long as the caller has a 
      CycleLock.
      
      Record.h, Record.cpp, RecordVersion.h & RecordVersion.cpp;
      
      The most significant change is that when a data buffer is chilled, 
      it recieves the value (-1) and this happens atomically.  
      Record::state == recChilled is no longer used.  
      The record.data pointer is now always read once, checked, and 
      returned up the call stack to where it is used.
      
      hasRecord() is changed from an inline function to a virtual function 
      for both Record and RecordVersion. hasRecord() and isChilled() are 
      added so that the record can check data.record itself. 
      
      setEncodedRecord() and thaw() return the pointer to the buffer
      that they just created and put into data.record so that the caller 
      does not have to rely on that buffer being there later.
      
      getEncodedValue() uses the recordData pointer of the caller instead
      of getting it again from data.record.
      
      getAllocatedSize() is added so that the number of bytes in a buffer 
      holding the record can be consistently known.  It is used in several 
      places.  
      
      chillData() is similar to and replaces deleteData() during 
      SRLUpdateRecords::chill
      
      isDeleted() is also added so that the common (state == recDeleted) 
      check can be done by the record itself.  This is not fully 
      implemented yet because we need to get away from the over-use of 
      these state flags.  They are somewhat unreliable on multiple CPUs 
      unless they are set with CAS.
      
      getRecordData() is also changed from an inline to a virtual function.
      
      findRecordData() is like getRecordData but without the thaw.  It is 
      used in three places along the call stack for thawing a record to 
      check if another thread has already thawed that record.  This is 
      not likely now because of the use of syncThaw.  But syncThaw may 
      no longer be necessary.
      
      isAllocated() is an inline function used about 6 times to determine 
      if a pointer is real or not.
      
      SRLUpdateRecords.h & SRLUpdateRecords.cpp;
      
      thaw() returns the data buffer thawed in addition to the number of 
      bytes reallocated.
      
      chill() is not sent the dataLength anymore since it calculates that 
      itself.
      
      SerialLog.h, SerialLog.cpp, transaction.h, transaction.cpp;
      
      Since the number of bytes chilled and thawed stored in the SerialLog 
      and Transaction were being maintained but not used, I deleted those
      variables and code used to maintain them.  The number of allocated 
      bytes associated with a Transaction is stored in totalRecorddata and 
      is still maintained.
      
      transaction.cpp
      Even though backlogging is not currently active, the choice whether 
      to backlog any particular record needed to be enhanced.  Along with 
      replacing hasRecord(false) with isChilled(), it should avoid 
      backlogging record chains that do not start at the base record.  
      That is checked with isSuperceded(). Also, backlogging should occur 
      for deleted records as well as newly chilled records.
[12 May 2009 22:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/73882

2704 Kevin Lewis	2009-05-12
      Code Cleanup and compile warnings
      Fix the previous patch for Bug#43344 & Bug#44721 that made a call to getAllocatedSize()
      from inside a thaw.  This caused a recursive loop and a stack overflow.