Bug #40801 Falcon garbage collection can erase valid key values
Submitted: 17 Nov 2008 19:52 Modified: 15 May 2009 12:56
Reporter: Kevin Lewis Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0.8 OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any
Tags: DesignDoc, F_SCAVENGER

[17 Nov 2008 19:52] Kevin Lewis
Description:
There is problem in Table::expungeRecordVersions().  It aught to use the base record for what is staying so that all staying key values can be compared against the key value that is leaving.  Instead, it identifies the staying record as the record which immediately succeeds the leaving record.  This happens because the function that calls Table::expungeRecordVersions(), RecordVersion::scavenge(), calls itself recursively until the last staying/visible record is found.

We have not seen this bug before because no scavenging occurs until the scavenge threshold is reached.  Bug#34893 addresses this problem.  Pruning of old invisible record versions should happen fairly often.  When pruning happnens more often, this bug will appear unless it is fixed also

DETAIL:

Falcon uses only one and only one entry in the index for each key-RecordNumber pair.  If the same key values are used over and over again for successive record updates, there may be multiple RecordVersions on the same record chain with the same key value.  But that key value will only occur once in the key page.  When the next scavenge cycle occurs, old record versions will be pruned from the list and the associated key values are garbageCollected.  The garbageCollect function needs to prevent deleting key values from a record leaving the chain if that key value also belongs to a record that is staying after the scavenge.

Index::garbageCollect() is called with Record pointer to a leaving record and a staying record.  It will compare a the key value in the leaving record and all its prior records to the key values in the staying record and all its priors, looking for duplicates.  If there are no duplicates, the key can be deleted.

Consider the following scenario in which record number 1 is changed by several transactions while other repeatable read transactions are held open and keep the later records from being scavenged.

TransID  State       RecordNum  Key     Comment
10       Committed    1          -
9        Active       repeatable-read   Sees 'C'  
8 SP1    Committed    1          C
8        Released     1          B      already scavenged by SP1
7        Active       repeatable-read   Sees 'A' 
6        Committed    1          A
5        Active       repeatable-read   Sees 'D'
4        Committed    1          D      visible to Trans 5
3        Committed    1          C
2        Committed    1          B
1        Committed    1          A
0        Committed    1          -

When the scavenger calls Table::expungeRecordVersions() for RecordNum 1, it needs to scavenge the records for transaction 0,1,2 & 3.  4 is still visible to 5, so it must stay, along with all the newer versions.

In order for Index::garbageCollect to do so correctly, it must be sent the record for TransID 3 as a leaving record and the the record for transID 10 as the staying record.  But the current code sends in the record for transID 4 as the staying record.  It is staying, but since 3 is disconnected before calling  Index::garbageCollect(), 4 has no priors.  Index::garbageCollect does not have a view of all the staying records.

How to repeat:
Start the server with falcon-record-memory-max=5Mb

# Client 1
DROP TABLE t1;
CREATE TABLE t1 (a CHAR(100)) ENGINE falcon;
INSERT INTO t1 VALUES (repeat('-', 100));
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;

DROP TABLE t2;
CREATE TABLE t2 (a CHAR(100)) ENGINE falcon;

# Now each one of these should cause a forced record scavenge.
INSERT INTO t2 SELECT * FROM t1;

# Client 2
CREATE TABLE t3 (f1 CHAR(10) PRIMARY KEY) ENGINE falcon;
INSERT INTO t3 VALUES ('-');
UPDATE t3 SET f1 = 'A';
UPDATE t3 SET f1 = 'B';
UPDATE t3 SET f1 = 'C';
UPDATE t3 SET f1 = 'D';

# Client 3 - Should see 'D'
BEGIN;
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = 'D';

# Client 2
UPDATE t3 SET f1 = 'A';

# Client 4 - Should see 'A'
BEGIN;
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = 'A';

# Client 2
BEGIN;
UPDATE t3 SET f1 = 'B';
SAVEPOINT sp1;
UPDATE t3 SET f1 = 'C';
RELEASE SAVEPOINT sp1;
COMMIT;

# Client 5 - Should see 'C'
BEGIN;
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = 'C';

# Client 2
UPDATE t3 SET f1 = '-';
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = '-';

# Client 1 - Force another record scavenge.
INSERT INTO t2 SELECT * FROM t1;

# Client 3 - Should see 'D'.  Works because the staying records was 'D'
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = 'D';

# Client 4 - Should see 'A' but does not since the 'A' key was scavenged.
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = 'A';

# Client 5 - Should see 'C', but does not since the 'C' key was scavenged.
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = 'C';

Suggested fix:
This happens because the caller of Table::expungeRecordVersions(), RecordVersion::expunge(), calls itself recursively until it finds the oldest staying record.  It should instead find the leaving record and call Table::expungeRecordVersions() for the leaving record and the base record so that all staying records can be seen by the index garbage collector.
[18 Nov 2008 14:07] Kevin Lewis
Note that if you do this after the scavenge;

# Client 1, 2 or any other new client;
SELECT * FROM t3;
SELECT * FROM t3 WHERE f1 = '-';  # This will fail to find any records

So the index has been corrupted by this bad scavenge.
[2 Dec 2008 18:49] Sveta Smirnova
Thank you for the report.

Verified as described.
[7 Jan 2009 8:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62571

2958 Kevin Lewis	2009-01-07
      Bug#34893, Bug#36700, Bug#40342, Bug#40801
      New scavenger code, See WL#4606, WL#4477 & WL#4478 for more info.
      
      This is the second full patch of the new scavenger.  
      1) Separate the two tasks involved in scavenging into
         A. Pruning old invisible record versions.
            1. As part of this task, a survey is taken of records
               in age groups so that the next step can be done in 
               one pass
         B. Retiring least-recently-used records from the record cache
      2) Watch how much record space is added to the record cache 
         and start a new age group at regular intervals. 
      3) Put the scavenger into a separate thread that can be 
         signaled when needed.
      
      This patch contains many improvements over the last patch including;
      1) forcedRecordScavenge is now signalScavenge and the loop is 
      increased to 5 times attempting to get memory from the record cache.
      2) Unused code is deleted
      3) Integrated with new dependency manager code so that it now uses
      Transaction::commitId to determine if a record is old enough to be
      scavenged.
[9 Jan 2009 5:15] Kevin Lewis
This still needs a test case
[21 Jan 2009 5:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63654

2969 Kevin Lewis	2009-01-20
      Bug#40801 - Testcase to show that the new scavenger does not
      garbageCollect key values that are needed for records not pruned.
[23 Jan 2009 6:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63866

2972 Kevin Lewis	2009-01-23
      Bug#40801 - Testcase to show that the new scavenger does not
      garbageCollect key values that are needed for records not pruned.
[13 Feb 2009 7:24] Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:vvaintroub@mysql.com-20090123204400-cz3fg2xxkopmmf7g) (merge vers: 6.0.10-alpha) (pib:6)
[15 May 2009 12:56] MC Brown
An entry has been added to the 6.0.10 changelog: 

The Falcon storage engine has been updated to incorporate new code for the built-in scavenger service, which handles the caching of records in memory. This fixes a number of different issues related to the visibility of different records during certain operations and improves the memory usage.