MySQL Bugs: #26643: Falcon: no blob space recovery

Bug #26643	Falcon: no blob space recovery
Submitted:	26 Feb 2007 18:59	Modified:	23 Oct 2007 20:24
Reporter:	Peter Gulutzan	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S5 (Performance)
Version:	5.2.4-falcon-alpha-debug	OS:	Linux (SUSE 10.0 / 64-bit)
Assigned to:	Ann Harrison	CPU Architecture:	Any

Description:
I create a Falcon table with a BLOB or TEXT column.
I update the column many times.
Although there's no net increase in column length,
the size of the underlying file grows.

How to repeat:
delimiter //
create database k//
use k//
create table t (s1 mediumtext) engine=falcon//

/* Look at /usr/local/mysql/var/k.fts. Length = 434176. */

insert into t values (repeat('a',1000))//
create procedure p ()
  begin
    declare v int default 0;
    while v < 10000 do
    if v mod 1000 = 0 then select v; end if;
    update t set s1 = repeat('a',rand()*1000);
    set v = v + 1;
    end while;
  end//
call p()//
/* Look at /usr/local/mysql/var/k.fts. Length = 9428992. */

Thank you for the bug report. Verified as described.

Sample doesn't compile:

mysql> delimiter //
mysql> create procedure p ()
    ->   begin
    ->     declare v int default 0;
    ->     while v < 10000 do
    ->     if v mod 1000 = 0 then select v; end if;
    ->     update t set s1 = repeat('a',rand()*1000);
    ->     set v = v + 1;
    ->     end while;
    ->   end//
ERROR 1307 (HY000): Failed to CREATE PROCEDURE p
mysql>

I've seen "ERROR 1307 (HY000): Failed to CREATE" before.
It's a bug (there's nothing wrong with the procedure),
and I suspect that it only occurs with Falcon,
but I didn't report it because I couldn't repeat it.

If you can repeat it, that's great!

Jim,  With the Windows MYSQL.exe, the delimiter statement has to be followed by the old delimiter.  That's maybe why it didn't work.  This worked for me, but I am not sure I saw any increase in blob space.  

drop database k;
create database k;
use k;
create table t (s1 mediumtext) engine=falcon;
insert into t values (repeat('a',1000));
delimiter // ;
create procedure p ()
  begin
    declare v int default 0;
    while v < 100000 do
    if v mod 1000 = 0 then select v; end if;
    update t set s1 = repeat('a',rand()*1000);
    set v = v + 1;
    end while;
  end//
delimiter ; //
call p();

The procedure modifies a record with a blob field 100,000 times,
and Falcon does in fact, write 100,000 versions of the blob to
disk.  When the transaction commits, 999,999 of those versions
are removed, and the space they use is released for reuse.
The garbage collection does not occur during the execution of
the transaction, but only at the end.

Verified as described. The UPDATE happens on a single row with autocommit enabled. There should be no increase in the table space files.

Before:
lu0011:/data/mysql# du -h *.fts
412K    falcon_master.fts
16K     falcon_temporary.fts
16K     falcon_user.fts

After:
lu0011:/data/mysql# du -h *.fts
448K    falcon_master.fts
16K     falcon_temporary.fts
5.0M    falcon_user.fts

Procedure used:

SET @@autocommit = 1;
delimiter //
create database k//
use k//
create table t (s1 mediumtext) engine=falcon//

insert into t values (repeat('a',1000))//
create procedure p ()
  begin
    declare v int default 0;
    while v < 10000 do
    if v mod 1000 = 0 then select v; end if;
    update t set s1 = repeat('a',rand()*1000);
    set v = v + 1;
    end while;
  end//
call p()//

After calling p() several times falcon_user.fts grows to 15M

I did run the test and after the procedure completes, the
next reference to that row causes the engine to clean out
the old versions.  Let me check that the commit is actually
being seen in the autocommit transaction during a procedure.

Hakan, 
Each verb within a transaction is like a little SetSavepoint-ReleaseSavepoint.  Usually, records that were changed within a transaction, and then changed again, are released at the end of the verb by Record::scavenge().  This in turn calls Table::garbageCollect() which does garbage collection on indexes, inversions, and blobs.  If this did not work, the test case, with 10,000 updates of 1000 byte blobs, would have made a falcon_user.fts much greater than 10 Mb.  But it was only 5 Mb after the test.  

It is not necessary for Falcon to keep tablespace growth to zero during a transaction of 10,000 updates to the same record.  Falcon is not MYISAM and does not need to react the same.  

If it can be shown that garbage collection of BLOBs is NOT happening during a test like this, please re-open the bug.  Otherwise, lets close it.

As Kevin said, Falcon is not MyISAM or InnoDB.  What you're seeing here is
the effect of group commits.  All the blobs in a group of transactions 
being committed at the same time must be written before the group can be
committed.  Once the group is committed, then the old versions can be
scavenged.  That's why there is growth, but the growth is less than would
have occurred if blob space were not being reclaimed at all.

When Jim pushes his changes causing small blobs to be logged instead of 
being written directly from the page cache to the database, the growth 
shown by this test will be less.  Larger blobs (> 1M?) will still cause
some growth when modified in a hard loop like this.  The wasted space
is recovered, but not immediately.

I ran the test twelve times in one sequence and six times in another.
falcon_user.fts started at 40,960.  After one run of the procedure it
grew to 1,097,728.  The next run of the procedure caused it to grow 
to 5,255,168.  The next run it increased to 5,271,552.  It stayed that
size for the next six runs, then increased slightly to 5,308,416 where
it stayed for three more runs.  On the second sequence, using Jim's 
tree rather than the team tree, the tablespace started at 16,384 then
grew to 1,101,824, then 5,230,592 where it stayed steady for three 
runs.  The next run increased the size of the tablespace slightly to
5,246,976 where it stayed for two more runs.

The slight increases are consistent with the fact that blobs are
stored at their actual length and the length is generated randomly.

What's happening is that the blobs are being written when before the
transaction commit is written.  Each update is its own transaction and
their commits are batched, so nothing can be released until the 
transactions are actually committed and written to the database.

The performance is lousy because releasing a long chain of blobs is
an n log n operation.