Bug #40651 Falcon: Record chill with data length == 0 corrupts serial log
Submitted: 11 Nov 2008 22:11 Modified: 13 Nov 2008 10:14
Reporter: Christopher Powers
Status: Patch queued
Category:Server: Falcon Severity:S1 (Critical)
Version:6.0 OS:Any
Assigned to: Christopher Powers Target Version:6.0-beta
Tags: F_CHILL THAW
Triage: Triaged: D1 (Critical)

[11 Nov 2008 22:11] Christopher Powers
Description:
The record chill operation sometimes generates SRLUpdateRecords entries in the serial log
that have zero data length. This causes a corruption of the serial log.
      
The serial log corruption manifests as this assertion in SerialLogControl::nextRecord():

ASSERT(*input >= (LOW_BYTE_FLAG | srlEnd) && *input < (LOW_BYTE_FLAG | srlMax))

How to repeat:
Branch or checkout the mysql-test-extra-6.0 bazaar tree and run:

$ cd mysql-test\gentest
$ perl runall.pl \
 --basedir=path\to\mysql-6.0-falcon-team \
 --reporters=Deadlock,ErrorLog,Backtrace \
 --mysqld=--loose-falcon-lock-wait-timeout=1 \
 --engine=Falcon \
 --grammar=conf/falcon_chill_thaw.yy \ 
 --queries=100000 \
 --mysqld=--falcon-index-chill-threshold=4K \ 
 --mysqld=--falcon-record-chill-threshold=4K \ 
 --duration=1200

Suggested fix:
The total data length of the chilled records is known only after all of the records have
been processed in SRLUpdateRecords::append(). If none of the record versions within a
given savepoint are eligible to be chilled, the total data length of the serial log record
will be zero.
      
Even when there are zero bytes of data to commit, SRLUpdateRecords::append() issues a
serial log flush and forces the creation of a new serial log window. This creates a
one-byte offset error in the serial log that ultimately results in a fatal assertion
during gopher thread processing.

To prevent this error, SRLUpdateRecords::append() only flushes serial log
records with a non-zero data length.

It is unclear whether there is a bug in SerialLog::flush() because
SRLUpdateIndexes::append() routinely commits and flushes records with zero-length data
with no apparent corruption.

This will be addressed in a separate bug, TBD.
[11 Nov 2008 22:17] Christopher Powers
(This commit was originally attributed to Bug#39696. It is included here for
completeness.)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/58261

2906 Christopher Powers	2008-11-08
     Bug#39696, "Assertion in Table.cpp (dup->state == recDeleted) fails during
falcon_chill_thaw"
[12 Nov 2008 17:14] Kevin Lewis
Chris, This avoids the crashed serial log when a zero length SRLUpdateRecord  occurs.  So
the current known bug seems to be fixed.

But there may still be a problem.

Both SRLUpdateIndex and SRLUpdateRecords will find the virtualOffset of the start of the
serialLogRecord, add a few bytes of header stuff like transactionId & savepointId, then
loop through the index nodes or records adding them to the main record.  If the end of the
serialLogWindow is reached, a new flush(true,... is called which creates a new window.  

It uses the warningTrack to determine when it reaches the end.  But this variable is used
intrinsically in putData() to start a new window if any data does not fit onto the
previous window.  Search for warningTrack and you will find that only SRLUpdateIndex and
SRLUpdateRecords try to deal with it explicitly.  Maybe they should not worry about it? 
If a putData forces a flush and a new window, these append functions are still dealing
with the old warning track!