Bug #40651 Falcon: Record chill with data length == 0 corrupts serial log
Submitted: 11 Nov 2008 21:11 Modified: 26 May 2010 17:46
Reporter: Christopher Powers Email Updates:
Status: Unsupported Impact on me:
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0 OS:Any
Assigned to: Christopher Powers CPU Architecture:Any
Triage: Triaged: D1 (Critical)

[11 Nov 2008 21:11] Christopher Powers
The record chill operation sometimes generates SRLUpdateRecords entries in the serial log that have zero data length. This causes a corruption of the serial log.
The serial log corruption manifests as this assertion in SerialLogControl::nextRecord():

ASSERT(*input >= (LOW_BYTE_FLAG | srlEnd) && *input < (LOW_BYTE_FLAG | srlMax))

How to repeat:
Branch or checkout the mysql-test-extra-6.0 bazaar tree and run:

$ cd mysql-test\gentest
$ perl runall.pl \
 --basedir=path\to\mysql-6.0-falcon-team \
 --reporters=Deadlock,ErrorLog,Backtrace \
 --mysqld=--loose-falcon-lock-wait-timeout=1 \
 --engine=Falcon \
 --grammar=conf/falcon_chill_thaw.yy \ 
 --queries=100000 \
 --mysqld=--falcon-index-chill-threshold=4K \ 
 --mysqld=--falcon-record-chill-threshold=4K \ 

Suggested fix:
The total data length of the chilled records is known only after all of the records have been processed in SRLUpdateRecords::append(). If none of the record versions within a given savepoint are eligible to be chilled, the total data length of the serial log record will be zero.
Even when there are zero bytes of data to commit, SRLUpdateRecords::append() issues a serial log flush and forces the creation of a new serial log window. This creates a one-byte offset error in the serial log that ultimately results in a fatal assertion during gopher thread processing.

To prevent this error, SRLUpdateRecords::append() only flushes serial log
records with a non-zero data length.

It is unclear whether there is a bug in SerialLog::flush() because SRLUpdateIndexes::append() routinely commits and flushes records with zero-length data with no apparent corruption.

This will be addressed in a separate bug, TBD.
[11 Nov 2008 21:17] Christopher Powers
(This commit was originally attributed to Bug#39696. It is included here for completeness.)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:


2906 Christopher Powers	2008-11-08
     Bug#39696, "Assertion in Table.cpp (dup->state == recDeleted) fails during
[12 Nov 2008 16:14] Kevin Lewis
Chris, This avoids the crashed serial log when a zero length SRLUpdateRecord  occurs.  So the current known bug seems to be fixed.

But there may still be a problem.

Both SRLUpdateIndex and SRLUpdateRecords will find the virtualOffset of the start of the serialLogRecord, add a few bytes of header stuff like transactionId & savepointId, then loop through the index nodes or records adding them to the main record.  If the end of the serialLogWindow is reached, a new flush(true,... is called which creates a new window.  

It uses the warningTrack to determine when it reaches the end.  But this variable is used intrinsically in putData() to start a new window if any data does not fit onto the previous window.  Search for warningTrack and you will find that only SRLUpdateIndex and SRLUpdateRecords try to deal with it explicitly.  Maybe they should not worry about it?  If a putData forces a flush and a new window, these append functions are still dealing with the old warning track!