MySQL Bugs: #45746: falcon_backlog test fails due to Gopher hitting "record memory is exhausted"

Bug #45746	falcon_backlog test fails due to Gopher hitting "record memory is exhausted"
Submitted:	25 Jun 2009 11:25	Modified:	26 May 2010 17:48
Reporter:	Olav Sandstå	Email Updates:
Status:	Unsupported	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S2 (Serious)
Version:	6.0.12-alpha	OS:	Any
Assigned to:	Kevin Lewis	CPU Architecture:	Any
Tags:	F_MEMORY

Description:
When running the RQG test falcon_backlog this fails with the following written
to the log file:

  Thread::thread: thread 1180424512: record memory is exhausted
  terminate called after throwing an instance of 'SQLError'

The thread that is terminating it one of the Gopher threads.

How to repeat:
Run falcon_backlog test, repeatable every time.

Repeatable as described by running falcon_backlog.

This bug where the gopher thread hits "record memory is exhausted" can also happen when running falcon_bug_36294-big test as shown in pushbuild.  See bug#45845 for other problems with that test case.

How can a gopher thread get status 305? Gopher threads are never expected to run out of record memory. By design, they build pages from the contents of the serial log and do not muck with the record cache at all.

Back on 5/19/2009, a change committed for Bug#44946 added a little function called Transaction::thawAll(). This is mucking with the record cache!

Here was the reason explained in the patch;

"In order to prevent 'unthawable' records, make sure all records attached to a transaction are thawed by the gopher thread when the transaction is fully complete. Do this by adding a new function; Transaction::thawAll()"

This nasty little function is called by the gopher thread;
SerialLogTransaction::commit() -> Transaction::fullyCommitted() -> Transaction::thawAll()

Because it thaws every chilled record in the transaction after being completed, it ruins the effect of thawing in the first place! Woops! Huge insert transactions that are larger than the record cache can get committed, but then the gopher thread will crash with error 305 "record memory is exhausted" trying to thaw all those records.

This function is still needed, but it does not need to thaw ALL records in the transaction. Once a record has been put into the page it cannot be thawed from the serial log. And if it has already been superceded, there will soon be no way to thaw the record, once that superceding record version gets completed into the data page. So only records that get superceeded between commit-time and complete-time really need to be thawed from the serial log before its transaction is completed.

The record memory requirements of Transaction::thawAll can be greatly reduced by checking if the completed record has been superceded before thawing it. With this modification, the huge insert transactions in falcon_backlog and falcon_bug_36294-big.test may not actually have to thaw anything at complete time.

But there is a complication to implementing this. RecordVersion::superceded is not very reliable. Most records on a long chain of priorVersions will still have superceded=false. See the comment in Bug#44946 on [20 May 6:23] for an example. An analysis of the where isSuperceded() is used reveals that what we really want to know is whether the record is the current base record, that is, the record at the top of the stack, the one currently attached to the record tree. In order to make this information reliable, it needs to be set and unset by RecordLeaf::store() to the records that ore being attached or detached from that position.

So the following patch converts RecordVersion::superceded to RecordVersion::base along with their associated uses.
Then it adds if (record->isBase()) to Transaction::thawAll(). If it is not the base record a complete time, it needs to be thawed while it can still be thawed from the serial log.
Falcon uses CAS to make that switch. Setting one or two booleans after that is inherently non-atomic. So how can this boolean be considered reliable? Well, because this CAS is done within a transaction. Only one thread works on behalf of a transaction at a time, and once a transacton adds a base record to the top of a record chain, no other transaction can add another. So you cannot get two base records chained up concurrently, overlapping code paths. If the CAS succedes, then that thread can set RecordVersion::base=true immediately after that and trust that it remains that value. The thread for that transaction will be the only user of that boolean.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78242

2753 Kevin Lewis	2009-07-08
      Bug#45746 - Gopher thread gets error 305.
      Transaction::thawAll was added for 44946 but it was too intrusive.
      It required a full thaw of all records in the transaction.  This
      patch limits thawAll to only those records that have already been
      superceded by a newer change at 'complete time' when thawAll is 
      called.  
      
      Also, RecordVersion::superceded is converted to 
      RecordVersion::base and made more reliable by updating it in
      RecordLeaf::store.
     @ storage/falcon/Record.cpp
        Bug#45746 - Convert 'superceded' to 'base'
        The value of 'base does not matter for a Record object
        since it is not assocviated with any transaction.
     @ storage/falcon/Record.h
        Bug#45746 - Convert 'superceded' to 'base'
     @ storage/falcon/RecordLeaf.cpp
        Bug#45746 - Set RecordVersion::base in RecordLeaf::store()
        so that it reliably reflects whether a record is currently 
        the base record.
     @ storage/falcon/RecordVersion.cpp
        Bug#45746 - Convert 'superceded' to 'base'.  
        Delete old places where superceded is set since 'base' 
        is set in RecordLeaf::store().
     @ storage/falcon/RecordVersion.h
        Bug#45746 - Convert 'superceded' to 'base'.
     @ storage/falcon/Table.cpp
        Bug#45746 - Convert 'superceded' to 'base'.  
        Delete old places where superceded is set since 'base' 
        is set in RecordLeaf::store().
     @ storage/falcon/Transaction.cpp
        Bug#45746 - Convert 'superceded' to 'base'.  
        Use RecordVersion::isBase() to limit the number of records
        thawed at complete time.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78652

2755 Kevin Lewis	2009-07-14
      Bug#45746 - Gopher thread gets error 305.
      The previous patch for this bug made it highly unlikely for the
      gopher thread to get error 305-out-of-record-cache.  But it is
      still possible since thawAll() still may need to muck with the 
      record cache.
      
      So this patch makes the gopher thread retry the attempt to 
      allocate a record another 10 time.  Client threads which also 
      get this error will not retry so they will probably rollback 
      and free some memory.  In thawAll(), the gopher will also give 
      up if the transaction gets too old while this is happening.
      
      Then after 10 tries, which should take more than 22.5 seconds,
      The gopher will just give up trying to thaw that record, 
      which may never be needed anyway.  If it is needed, the client 
      thread must be able to recognize an unthawable record.
      
      If thawAll does not thaw a record, it will remain thawable
      from the serial log until the serial log toggles so that the 
      virtual offset is no longer around.  So this patch uses two 
      new indicators to identify an unthawable record.  The first is
      a new state called recBuried which indicates that the record 
      version has been replaced in the page.  This is now set in 
      thawAll(), during transaction completion.  Then later, if a 
      'buried' record cannot be thawed from the serial log, 
      RecordVersion::thaw() does not attempt to thaw it from the 
      page.  It just marks it as unthawable by setting 
      Record::data.record = recordDataIsFreed.
      
      Then the client thread in fetchVersion() must throw an error 
      indicating that a visible record cannot be thawed.
     @ storage/falcon/Record.cpp
        Bug#45746 - account for new state recBuried
     @ storage/falcon/Record.h
        Bug#45746 - new state recBuried
     @ storage/falcon/RecordVersion.cpp
        Bug#45746 - fetchVersion() should recognize an unthawable record
        It throws an internal error with an appropriate message.
        RecordVersion::thaw() now avoids thawing a record from the data 
        page that has been superceded.
     @ storage/falcon/Table.cpp
        code cleanup
     @ storage/falcon/Transaction.cpp
        Bug#45746 - thawAll() is enhanced to attempt to thaw arecord 10
        time if the error is 305-out-of-record-memory.  It will also
        quit early if the transaction becomes too old.  Then if 10 loops
        occur, it will just ignore that record, hoping that it is not 
        actually needed before the serial log toggles and it becomes 
        unthawable.
        
        thawAll also now marks prior records that have been superceded
        by these completed records as recBuried.
        the data page, or 'buried'.

Code Review in email and on the phone

Looking at current (since 2009-07-13) PB2 results for the falcon_backlog test, the test reports success, but in the log we find lots of "record memory is exhausted" error messages, like this one:

# 02:24:26 Query: UPDATE `table65536_falcon_int_autoinc` AS X SET `pk` = 'hobinrwybizyppgwvpmwaodbmgjvnlcoclwwbxpggiorbbukjnjryagyjchtudhaifiukvdibhtpffdzxkzflnfwzlhvvyomkpizeglchdonjcxqgevbzwwrcgdafrbuqrdcncucquecwtmoxozkbk' WHERE `pk` = 14694 failed: 1296 Got error 305 'record memory is exhausted' from Falcon

Is this as expected and acceptable?
(I may have missed something in the discussions relating to this bug... )

Although certain statements fail due to lack of memory it seems that the client is able to continue until the end of the test, and the forced recovery succeeds...

This bug and patch was to avoid having the gopher thread suffer from 305-out-of-record-cache-memory errors.  It seems like this is working.  The 305 errors are now being reported by the client threads.  A future patch which I am working on will re-enable backlogging.  This should also allow this test to complete without 305 errors.  See Bug#45845.