MySQL Bugs: #36708: extreme fragmentation of a table with blob records

Bug #36708	extreme fragmentation of a table with blob records
Submitted:	14 May 2008 6:45	Modified:	20 May 2008 17:55
Reporter:	Shane Bester (Platinum Quality Contributor)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: MyISAM storage engine	Severity:	S3 (Non-critical)
Version:	5.0.60,5.1.24	OS:	Any
Assigned to:	Ingo Strüwing	CPU Architecture:	Any

Description:
the attached php program will eventually cause the t1.MYD to consume all available disk space.   the testcase simply does this:

1) insert two records
2) update a record to 8M
3) update same record to 16M, then 24M, then 32M
4) goto 2)

the fragmentation becomes so bad that the file size grows endlessly.
'optimize table' fixes the problem temporarily, but it soon happens again.

How to repeat:
run the attached php script in two threads

1) php bug.php 1

2) php bug.php 2

now monitor the size of the MYD file or show table status.  wait until the number of iterations becomes large and check the sizes.

testcase. run in two threads for a few hundred iterations

Attachment: bug36708.php (application/octet-stream, text), 1.84 KiB.

after 200 iterations of 2 threads for the testcase, the data files are bloated due to the free data:

           8,602 t1.frm
     750,936,368 t1.MYD
           2,048 t1.MYI

mysql> show table status like 't1'\G
           Name: t1
         Engine: MyISAM
           Rows: 2
 Avg_row_length: 37748808
    Data_length: 767715128
      Data_free: 692217512

disabling concurrent inserts has no visible affect on the testcase.

I can repeat this on 5.1.24 also. it just takes many more iterations of the testcase to see same effect.

This is normal MyISAM behavior.

Records longer than 16MB-8 (reclen) are split in more than one part.

Updates try to reuse every part of the old record. If one of the old
parts is at end of file, it is extended up to its maximum of 16MB-4
(framelen), unless this would exceed the maximum table size. This can
make the use of (some of) the rest of the record parts obsolete. These
are then deleted (put in the dellink).

The delete link behaves like a stack. The last entered block pops out
first.

If an update enlarges the record, and none of its parts is at end of
file, or the extension was insufficient, blocks are taken from the
delete link.

When freshly allocated in a gap-free file, the record parts are at
increasing positions in the file.

An enlarging record in a gap-free file allocates new blocks in
increasing file positions.

A record, that shrinks a lot, deletes its unused parts in the order they
appeared in the record. After a gap-free start this is from lower file
positions to higher file positions. Due to the stack like behavior of
the delete link the highest positioned record becomes first in the
delete link.

The usage pattern of the test case extends and shrinks records a lot.
When one thread shrunk its record and left blocks in the delete link,
and immediately after that the other thread enlarges its record, it
reuses one or more of the deleted blocks (unless one of its parts is at
EOF and can be extended enough). The first reused block is most probably
nearest EOF because the first parts of the shrunken record are deleted
first. If it happens that the other thread enlarges its record again,
and that reused block is indeed (still) at EOF, it will most probably be
extended. So the file grows.

This does not have anything to do with concurrency. The reason why the
test case doesn't show the problem with one thread only is that each
thread updates one record only. The existing record parts are always
reused in their old order. Only the last part can be at EOF. When the
record shrinks and deletes most of its parts, and no other record takes
them, it can reuse them itself.

Due to another optimization the enlarging record does not reuse its
previous blocks in reverse order (due to the stack characteristic of the
delete link). When reusing each of its parts, it checks if the block
behind it is a deleted block and recombines it with the existing block.
that way it reuses the previous block in the old order.

This beautiful optimal behavior breaks if another record (not
necessarily from another thread!) takes one or more of the old blocks of
this record. When enlarging again, it can no longer recombine every
block and finally needs to allocate a new block (or take one from the
delete chain). After that, the record parts may no longer be ordered in
increasing file position. Hence the chance increases that a part of an
updating record appears at EOF and will be extended.

I reproduced the growing MYD file with one thread. I changed the perl
script so that each update works randomly on one of the two records.

For details about the data file layout and definition of the terms
"part", "block", and "frame", see
http://forge.mysql.com/wiki/MySQL_Internals_MyISAM#MyISAM_Dynamic_Data_File_Layout

There are a couple of ways to change the behavior of MyISAM. Since every
change may have a downside (e.g. performance degression) I'll leave
it to the architects to decide upon further actions.

Commented log file of a test run, showing how record parts are treated by each SQL statement.

Attachment: bug36708-1.log (application/octet-stream, text), 323.77 KiB.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47409

ChangeSet@1.2568, 2008-06-04 12:26:58+02:00, istruewing@stella.local +1 -0
  Bug#36708 - extreme fragmentation of a table with blob records
  
  Not to be pushed.
  
  Temporarily added DBUG statements to show how record parts are
  treated by each SQL statement.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47410

ChangeSet@1.2568, 2008-06-04 12:31:27+02:00, istruewing@stella.local +1 -0
  Bug#36708 - extreme fragmentation of a table with blob records
  
  Not to be pushed.
  
  Temporarily added DBUG statements to show how record parts are
  treated by each SQL statement.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47438

ChangeSet@1.2568, 2008-06-04 15:57:58+02:00, istruewing@stella.local +1 -0
  Bug#36708 - extreme fragmentation of a table with blob records
    
  Not to be pushed.
    
  Temporarily added DBUG statements to show how record parts are
  treated by each SQL statement.