Bug #40801 Falcon garbage collection can erase valid key values
Submitted: 17 Nov 2008 20:52 Modified: 15 May 14:56
Reporter: Kevin Lewis
Status: Closed
Category:Server: Falcon Severity:S3 (Non-critical)
Version:6.0.8 OS:Any
Assigned to: Kevin Lewis Target Version:6.0.10
Tags: DesignDoc, F_SCAVENGER
Triage: Triaged: D2 (Serious)

[17 Nov 2008 20:52] Kevin Lewis
Description:
There is problem in Table::expungeRecordVersions().  It aught to use the base record for
what is staying so that all staying key values can be compared against the key value that
is leaving.  Instead, it identifies the staying record as the record which immediately
succeeds the leaving record.  This happens because the function that calls
Table::expungeRecordVersions(), RecordVersion::scavenge(), calls itself recursively until
the last staying/visible record is found.

We have not seen this bug before because no scavenging occurs until the scavenge
threshold is reached.  Bug#34893 addresses this problem.  Pruning of old invisible record
versions should happen fairly often.  When pruning happnens more often, this bug will
appear unless it is fixed also

DETAIL:

Falcon uses only one and only one entry in the index for each key-RecordNumber pair.  If
the same key values are used over and over again for successive record updates, there may
be multiple RecordVersions on the same record chain with the same key value.  But that key
value will only occur once in the key page.  When the next scavenge cycle occurs, old
record versions will be pruned from the list and the associated key values are
garbageCollected.  The garbageCollect function needs to prevent deleting key values from
a record leaving the chain if that key value also belongs to a record that is staying
after the scavenge.

Index::garbageCollect() is called with Record pointer to a leaving record and a staying
record.  It will compare a the key value in the leaving record and all its prior records
to the key values in the staying record and all its priors, looking for duplicates.  If
there are no duplicates, the key can be deleted.

Consider the following scenario in which record number 1 is changed by several
transactions while other repeatable read transactions are held open and keep the later
records from being scavenged.

TransID  State       RecordNum  Key     Comment
10       Committed    1          -
9        Active       repeatable-read   Sees 'C'  
8 SP1    Committed    1          C
8        Released     1          B      already scavenged by SP1
7        Active       repeatable-read   Sees 'A' 
6        Committed    1          A
5        Active       repeatable-read   Sees 'D'
4        Committed    1          D      visible to Trans 5
3        Committed    1          C
2        Committed    1          B
1        Committed    1          A
0        Committed    1          -

When the scavenger calls Table::expungeRecordVersions() for RecordNum 1, it needs to
scavenge the records for transaction 0,1,2 & 3.  4 is still visible to 5, so it must
stay, along with all the newer versions.

In order for Index::garbageCollect to do so correctly, it must be sent the record for
TransID 3 as a leaving record and the the record for transID 10 as the staying record. 
But the current code sends in the record for transID 4 as the staying record.  It is
staying, but since 3 is disconnected before calling  Index::garbageCollect(), 4 has no
priors.  Index::garbageCollect does not have a view of all the staying records.

How to repeat:
Start the server with falcon-record-memory-max=5Mb

# Client 1
DROP TABLE t1;
CREATE TABLE t1 (a CHAR(100)) ENGINE falcon;
INSERT INTO t1 VALUES (repeat('-', 100));
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;

DROP TABLE t2;
CREATE TABLE t2 (a CHAR(100)) ENGINE falcon;

# Now each one of these should cause a forced record scavenge.
INSERT INTO t2 SELECT * FROM t1;

# Client 2
CREATE TABLE t3 (f1 CHAR(10) PRIMARY KEY) ENGINE falcon;
INSERT INTO t3 VALUES ('-');
UPDATE t3 SET f1 = 'A';
UPDATE t3 SET f1 = 'B';
UPDATE t3 SET f1 = 'C';
UPDATE t3 SET f1 = 'D';

# Client 3 - Should see 'D'
BEGIN;
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = 'D';

# Client 2
UPDATE t3 SET f1 = 'A';

# Client 4 - Should see 'A'
BEGIN;
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = 'A';

# Client 2
BEGIN;
UPDATE t3 SET f1 = 'B';
SAVEPOINT sp1;
UPDATE t3 SET f1 = 'C';
RELEASE SAVEPOINT sp1;
COMMIT;

# Client 5 - Should see 'C'
BEGIN;
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = 'C';

# Client 2
UPDATE t3 SET f1 = '-';
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = '-';

# Client 1 - Force another record scavenge.
INSERT INTO t2 SELECT * FROM t1;

# Client 3 - Should see 'D'.  Works because the staying records was 'D'
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = 'D';

# Client 4 - Should see 'A' but does not since the 'A' key was scavenged.
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = 'A';

# Client 5 - Should see 'C', but does not since the 'C' key was scavenged.
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = 'C';

Suggested fix:
This happens because the caller of Table::expungeRecordVersions(),
RecordVersion::expunge(), calls itself recursively until it finds the oldest staying
record.  It should instead find the leaving record and call
Table::expungeRecordVersions() for the leaving record and the base record so that all
staying records can be seen by the index garbage collector.
[18 Nov 2008 15:07] Kevin Lewis
Note that if you do this after the scavenge;

# Client 1, 2 or any other new client;
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = '-';  # This will fail to find any records

So the index has been corrupted by this bad scavenge.
[2 Dec 2008 19:49] Sveta Smirnova
Thank you for the report.

Verified as described.
[7 Jan 9:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62571

2958 Kevin Lewis	2009-01-07
      Bug#34893, Bug#36700, Bug#40342, Bug#40801
      New scavenger code, See WL#4606, WL#4477 & WL#4478 for more info.
      
      This is the second full patch of the new scavenger.  
      1) Separate the two tasks involved in scavenging into
         A. Pruning old invisible record versions.
            1. As part of this task, a survey is taken of records
               in age groups so that the next step can be done in 
               one pass
         B. Retiring least-recently-used records from the record cache
      2) Watch how much record space is added to the record cache 
         and start a new age group at regular intervals. 
      3) Put the scavenger into a separate thread that can be 
         signaled when needed.
      
      This patch contains many improvements over the last patch including;
      1) forcedRecordScavenge is now signalScavenge and the loop is 
      increased to 5 times attempting to get memory from the record cache.
      2) Unused code is deleted
      3) Integrated with new dependency manager code so that it now uses
      Transaction::commitId to determine if a record is old enough to be
      scavenged.
[9 Jan 6:15] Kevin Lewis
This still needs a test case
[21 Jan 6:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63654

2969 Kevin Lewis	2009-01-20
      Bug#40801 - Testcase to show that the new scavenger does not
      garbageCollect key values that are needed for records not pruned.
[23 Jan 7:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63866

2972 Kevin Lewis	2009-01-23
      Bug#40801 - Testcase to show that the new scavenger does not
      garbageCollect key values that are needed for records not pruned.
[13 Feb 8:24] Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version
source revid:vvaintroub@mysql.com-20090123204400-cz3fg2xxkopmmf7g) (merge vers:
6.0.10-alpha) (pib:6)
[15 May 14:56] MC Brown
An entry has been added to the 6.0.10 changelog: 

The Falcon storage engine has been updated to incorporate new code for the built-in
scavenger service, which handles the caching of records in memory. This fixes a number of
different issues related to the visibility of different records during certain operations
and improves the memory usage.