Bug #36708 | extreme fragmentation of a table with blob records | ||
---|---|---|---|
Submitted: | 14 May 2008 6:45 | Modified: | 20 May 2008 17:55 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: MyISAM storage engine | Severity: | S3 (Non-critical) |
Version: | 5.0.60,5.1.24 | OS: | Any |
Assigned to: | Ingo Strüwing | CPU Architecture: | Any |
[14 May 2008 6:45]
Shane Bester
[14 May 2008 6:49]
MySQL Verification Team
testcase. run in two threads for a few hundred iterations
Attachment: bug36708.php (application/octet-stream, text), 1.84 KiB.
[14 May 2008 7:28]
MySQL Verification Team
after 200 iterations of 2 threads for the testcase, the data files are bloated due to the free data: 8,602 t1.frm 750,936,368 t1.MYD 2,048 t1.MYI mysql> show table status like 't1'\G Name: t1 Engine: MyISAM Rows: 2 Avg_row_length: 37748808 Data_length: 767715128 Data_free: 692217512
[14 May 2008 9:50]
MySQL Verification Team
disabling concurrent inserts has no visible affect on the testcase.
[14 May 2008 12:59]
MySQL Verification Team
I can repeat this on 5.1.24 also. it just takes many more iterations of the testcase to see same effect.
[20 May 2008 17:55]
Ingo Strüwing
This is normal MyISAM behavior. Records longer than 16MB-8 (reclen) are split in more than one part. Updates try to reuse every part of the old record. If one of the old parts is at end of file, it is extended up to its maximum of 16MB-4 (framelen), unless this would exceed the maximum table size. This can make the use of (some of) the rest of the record parts obsolete. These are then deleted (put in the dellink). The delete link behaves like a stack. The last entered block pops out first. If an update enlarges the record, and none of its parts is at end of file, or the extension was insufficient, blocks are taken from the delete link. When freshly allocated in a gap-free file, the record parts are at increasing positions in the file. An enlarging record in a gap-free file allocates new blocks in increasing file positions. A record, that shrinks a lot, deletes its unused parts in the order they appeared in the record. After a gap-free start this is from lower file positions to higher file positions. Due to the stack like behavior of the delete link the highest positioned record becomes first in the delete link. The usage pattern of the test case extends and shrinks records a lot. When one thread shrunk its record and left blocks in the delete link, and immediately after that the other thread enlarges its record, it reuses one or more of the deleted blocks (unless one of its parts is at EOF and can be extended enough). The first reused block is most probably nearest EOF because the first parts of the shrunken record are deleted first. If it happens that the other thread enlarges its record again, and that reused block is indeed (still) at EOF, it will most probably be extended. So the file grows. This does not have anything to do with concurrency. The reason why the test case doesn't show the problem with one thread only is that each thread updates one record only. The existing record parts are always reused in their old order. Only the last part can be at EOF. When the record shrinks and deletes most of its parts, and no other record takes them, it can reuse them itself. Due to another optimization the enlarging record does not reuse its previous blocks in reverse order (due to the stack characteristic of the delete link). When reusing each of its parts, it checks if the block behind it is a deleted block and recombines it with the existing block. that way it reuses the previous block in the old order. This beautiful optimal behavior breaks if another record (not necessarily from another thread!) takes one or more of the old blocks of this record. When enlarging again, it can no longer recombine every block and finally needs to allocate a new block (or take one from the delete chain). After that, the record parts may no longer be ordered in increasing file position. Hence the chance increases that a part of an updating record appears at EOF and will be extended. I reproduced the growing MYD file with one thread. I changed the perl script so that each update works randomly on one of the two records. For details about the data file layout and definition of the terms "part", "block", and "frame", see http://forge.mysql.com/wiki/MySQL_Internals_MyISAM#MyISAM_Dynamic_Data_File_Layout There are a couple of ways to change the behavior of MyISAM. Since every change may have a downside (e.g. performance degression) I'll leave it to the architects to decide upon further actions.
[4 Jun 2008 10:19]
Ingo Strüwing
Commented log file of a test run, showing how record parts are treated by each SQL statement.
Attachment: bug36708-1.log (application/octet-stream, text), 323.77 KiB.
[4 Jun 2008 10:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47409 ChangeSet@1.2568, 2008-06-04 12:26:58+02:00, istruewing@stella.local +1 -0 Bug#36708 - extreme fragmentation of a table with blob records Not to be pushed. Temporarily added DBUG statements to show how record parts are treated by each SQL statement.
[4 Jun 2008 10:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47410 ChangeSet@1.2568, 2008-06-04 12:31:27+02:00, istruewing@stella.local +1 -0 Bug#36708 - extreme fragmentation of a table with blob records Not to be pushed. Temporarily added DBUG statements to show how record parts are treated by each SQL statement.
[4 Jun 2008 13:58]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47438 ChangeSet@1.2568, 2008-06-04 15:57:58+02:00, istruewing@stella.local +1 -0 Bug#36708 - extreme fragmentation of a table with blob records Not to be pushed. Temporarily added DBUG statements to show how record parts are treated by each SQL statement.