MySQL Bugs: #41870: Unbounded tablespace growth when updating BLOB record

Bug #41870	Unbounded tablespace growth when updating BLOB record
Submitted:	5 Jan 2009 15:39	Modified:	15 May 2009 15:57
Reporter:	John Embretsen	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S2 (Serious)
Version:	6.0.10-bzr	OS:	Any
Assigned to:	Kevin Lewis	CPU Architecture:	Any
Tags:	BLOB, F_SCAVENGER

Description:
Falcon seems to keep extending the tablespace used for storing a BLOB when updating the BLOB data (default tablespace data file is falcon_user.fts).

This happens even if there is only one table with one BLOB entry, being updated with the exact same data each time. The tablespace file size increases with approximately the size of the update BLOB data when committed.

It looks like this will lead to disk space problems for users of Falcon BLOBs unless the space used is reclaimed at some clever point in time.

How to repeat:
Start the server including the option --max-allowed-packet=4Mb (or something to that effect) to allow BLOBs larger than 1 MB.

In the following example, the Falcon data files are in the directory $srcdir/var, where $srcdir refers to a compiled MySQL 6.0 source tree including Falcon on a Unix host.
The data file mysql-test/include/UnicodeData.txt (1.1 MB) is used as BLOB data.

## connect to the 'test' database
use test;

## Create a table with a LONGBLOB column
CREATE TABLE t1 (pk INT AUTO_INCREMENT PRIMARY KEY, myblob LONGBLOB) Engine = Falcon;

## Insert a BLOB record
INSERT INTO t1 (myblob) VALUES (LOAD_FILE('../mysql-test/include/UnicodeData.txt'));

## Check the file size of var/falcon_user.fts

## Update the record with some BLOB data of significant size. In this case the new data is the same as the original data.

UPDATE t1 SET myblob = LOAD_FILE('../mysql-test/include/UnicodeData.txt') WHERE pk = 1;

## Check the file size again. You will see that it has increased. 
## Repeat the update several times to verify continued growth.

---- ---- ----

To make it easier to repeat the update several times, feel free to use the following procedure:

DELIMITER //

CREATE PROCEDURE repeatBlobUpdate(reps INT)
BEGIN
  SET @n = 1;
  REPEAT
    UPDATE t1 SET myblob = LOAD_FILE('../mysql-test/include/UnicodeData.txt') WHERE pk = 1;
    SET @n = @n + 1;
  UNTIL @n > reps
  END REPEAT;
END

DELIMITER ;

## Repeat the update e.g. 20 times
CALL repeatBlobUpdate(20);

Suggested fix:
Reuse existing space when appropriate.

Thank you for the bug report. Verified as described.

I think it is likely that this bug is due to the current limitation of the scavenger.  Old records are not scavenged until the record cache memory threshold is reached which by default is 75% of 250Mb.  Each updated blob record is on blob pages attached to a record version that stays in memory until scavenged.  At that time, associated index entries and blob records/pages are released.

The new scavenger releases this space much more often as part of the pruning process, which now activates whenever the record cache is increased by 5% or 30 seconds has elapsed.

So, once the blob records/pages are scavenged and released, will I notice that the tablespace file size decreases or at lest does not increase if I run a few more updates?

I ran the procedure described in "how to repeat" a few hundred times until the size of falcon_user.fts reached about 700 MB, did not notice any difference in behavior along the way.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62571

2958 Kevin Lewis	2009-01-07
Bug#34893, Bug#36700, Bug#40342, Bug#40801, Bug#41870
New scavenger code, See WL#4606, WL#4477 & WL#4478 for more info.
      
This is the second full patch of the new scavenger.  
1) Separate the two tasks involved in scavenging into
   A. Pruning old invisible record versions.
      1. As part of this task, a survey is taken of records
         in age groups so that the next step can be done in 
         one pass
   B. Retiring least-recently-used records from the record cache
2) Watch how much record space is added to the record cache 
   and start a new age group at regular intervals. 
3) Put the scavenger into a separate thread that can be 
   signaled when needed.

This patch contains many improvements over the last patch
including;
1) forcedRecordScavenge is now signalScavenge and the loop is 
   increased to 5 times attempting to get memory from the 
   record cache.
2) Unused code is deleted
3) Integrated with new dependency manager code so that it now
   uses Transaction::commitId to determine if a record is old
   enough to be scavenged.

The new scavenger changes should go a long way toward making this bug not a problem.  It starts a scavenge cycle which will prune these old blob records whenever 2% of the record cache has been added in new record versions.  The problem with this test is that 1Mb of disk space in blob pages is added every time an update happens.  But only about 150 bytes of record cache is used up.  2% of the default cache is 5Mb. So that is about 33,000 updates before a scavenge will occur.  On a normal system, there will be a lot of other types of record versions being added to the record cache during the blob updates.  But this test only updates blobs.  

There is a workaround now that did not exist before these changes.  That is to lower the frequency of the scheduled scavenges.  That did not help previously because the scavenger would not prune these old invisible records until the scavenge threshold was reached, which was 75% of the record cache.  Now, it prunes every scavenge cycle.  

So a note for documentation:  If a system is doing a lot of large blob updates, they will want Falcon to scavenge more often in order to preserve the size of the tablespace from growing too disproportionately large;
Every 10 seconds;  
     falcon-scavenge-schedule=1,11,21,31,41,51 * * * * *
or every 5 seconds;  
     falcon-scavenge-schedule=1,6,11,16,21,26,31,36,41,46,51.56 * * * * *

John, please evaluate whether I need to add some kind of extra 'on-demand' pruning when blob records are updated.

Looks like the new scavenger (now in falcon-team branch) works much better with this particular scenario than the old one. Falcon tablespace file size seems to stabilize around 12 MB in this BLOB-only update scenario, running a couple of hundred updates in a row.

Kevin, do you have any tips for an effective way to "fill up" the record cache, causing eventual scavenging/pruning/retiring (for testing purposes)? I tried updating VARCHAR and CHAR records in a new table with new values, but the record cache usage seems to be stable, according to:

SELECT * FROM INFORMATION_SCHEMA.FALCON_RECORD_CACHE_SUMMARY;

E.g.:

+-------------+------------+---------------+-----------+-------------+--------------+
| TOTAL_SPACE | FREE_SPACE | FREE_SEGMENTS | BIG_HUNKS | SMALL_HUNKS | UNIQUE_SIZES |
+-------------+------------+---------------+-----------+-------------+--------------+
|     2097160 |    2039192 |             7 |         1 |           0 |            3 |
+-------------+------------+---------------+-----------+-------------+--------------+
1 row in set (0.00 sec)

FREE_SPACE changes when I update the BLOB, though, more or less matching what was mentioned in previous comments to this issue.

Wow, stabilize at 12 Mb seems like this bug is fixed.

I believe the Falcon manual needs to be updated regarding the default value of falcon_record_memory_max. The default is 250 MB, but the manual says it is 20. I logged Bug#42073 to track this doc issue.

Regression test falcon_blob_space-big implemented and available for review.
See http://lists.mysql.com/falcon/386 for details.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65815

3012 John H. Embretsen	2009-02-10
      New test related to Bug#41870 and Bug#42202, verifying that scheduled scavenging of old record versions in Falcon works with BLOBs, i.e. that the tablespace does not grow indefinitely when updating a BLOB multiple times.
      
      This test is perhaps more of a functional test of timing (schedule) based scavenging 
      related to BLOB updates than a real regression test for bug 41870 (Unbounded 
      tablespace growth when updating BLOB record). This is because in order to be as 
      timing-independent as possible, the test requires a fixed non-default value of 
      the option falcon_scavenge_schedule (currently set to 'every 5 seconds').
      
      On a standard disk based system the test may take a long time (~2 min on a relatively
      fast server with backround load) to run, and requires 30-50 MB of space, so this is
      a --big-test. In memory (--mem) the test should normally run in about 6-7 seconds.

This test succeeds now with scavenging eevery 5 seconds, which works for this bug.  Once the follow-on bug is fixed, Bug#42202, the test can be updated to use default settings.

Test falcon_blob_space-big pushed to mysql-6.0-falcon-team branch 2009-02-11 as revision john.embretsen@sun.com-20090210220635-tz9zpjinjn3hx8f3.

Pushed into 6.0.11-alpha (revid:alik@sun.com-20090302140208-lfdejjbcyezlhhjt) (version source revid:vvaintroub@mysql.com-20090211225555-gcuo9fv97xlhydd8) (merge vers: 6.0.10-alpha) (pib:6)

A note has been added to the 6.0.11 changelog: 

The Falcon storage engine has been updated to incorporate new code for the built-in scavenger service, which handles the caching of records in memory. This fixes a number of different issues related to the visibility of different records during certain operations and improves the memory usage. The same fix also corrects the behavior where the space allocated for BLOB records would not be recovered correctly, causing the size of the Falcon table space files to increase with each BLOB INSERT or UPDATE operation