Bug #59763 when does innodb compact blocks on delete
Submitted: 27 Jan 2011 0:16 Modified: 4 May 2011 19:12
Reporter: Mark Callaghan Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S3 (Non-critical)
Version:5.1,5.5 OS:Any
Assigned to: CPU Architecture:Any
Tags: compact, delete, innodb

[27 Jan 2011 0:16] Mark Callaghan
Description:
I don't think that InnoDB merges blocks with small fill factors during delete contrary to what the docs state:
http://dev.mysql.com/doc/refman/5.5/en/innodb-physical-structure.html
http://dev.mysql.com/doc/refman/5.5/en/innodb-file-defragmenting.html

Domas noticed that the PK index for table that gets inserts and delete on a frequent basis uses 7x as much space compared to a reloaded version. Can you explain why block merges are not done by purge to avoid using so much space? Block merges can be done when btr_cur_compress_if_useful is called. That is called by btr_cur_pessimistic_delete but not by btr_cur_optimistic_delete.
In row_purge_remove_clust_if_poss_low, btr_cur_optimistic_delete is called for mode==BTR_MODIFY_TREE but not for mode==BTR_MODIFY_LEAF. Does this mean leaf blocks are not merged despite being mostly empty?

         if (mode == BTR_MODIFY_LEAF) {
                 success = btr_cur_optimistic_delete(btr_cur, &mtr);
         } else {
                 ut_ad(mode == BTR_MODIFY_TREE);
                 btr_cur_f (mode == BTR_MODIFY_LEAF) {
                 success = btr_cur_optimistic_delete(btr_cur, &mtr);
         } else {
                 ut_ad(mode == BTR_MODIFY_TREE);
                 btr_cur_pessimistic_delete(&err, FALSE, btr_cur,
                                            RB_NONE, &mtr);
                 if (err == DB_SUCCESS) {
                         success = TRUE;
                 } else if (err == DB_OUT_OF_FILE_SPACE) {
                         success = FALSE;
                 } else {
                         ut_error;
                 }
         }

Behavior on update might be the same. In at least one case,
btr_cur_pessimistic_update can merge blocks but that is only called for
BTR_MODIFY_TREE and not for BTR_MODIFY_LEAF.
Can you explain when and where block merges can be done? For update? For
delete? For PK versus secondary indexes?

one more note from Domas:

Anyway, I looked a bit more - it is quite interesting, -
UPDATE code path gets shortcuted by update-in-place and the DB_UNDERFLOW error is not raised by optimistic update:
if (!row_upd_changes_field_size_or_external(index, offsets, update)) {
/* The simplest and the most common case: the update does not
change the size of any field and none of the updated fields is
externally stored in rec or update, and there is enough space
on the compressed page to log the update. */
mem_heap_free(heap);
return(btr_cur_update_in_place(flags, cursor, update,
       cmpl_info, thr, mtr));
}
and DELETE's optimistic-update doesn't even have DB_UNDERFLOW-raising branch.
So most of changes that affect PFF don't get to btr_compress() :(

How to repeat:
the testcase is simple:

1) create a table
2) write million rows to it, 'show table status'
3) delete rows where id%10>0 (that will leave 10% of rows), 'show table status'
4) update all remaining rows to test the update path, 'show table status'

the occupied space shouldn't change at all, row count should decrease 10x, average row size should increase 10x. Expected - pages should be merged as they're under 50% fill factor.

Suggested fix:
?
[8 Feb 2011 22:49] Mark Callaghan
Maybe Domas can supply a better reproduction case. I cannot reproduce the problem. I used the test listed below and measured both the size of the ibd file after inserting 1M rows, after deleting the largest ~90% of them, and then after inserting ~500,000 rows. The ibd file did not grow after the delete so the inserts that followed appeared to use space made available after the deletes. The i.ibd file was ~32M after the initial insert and ~32M at test end.

I also ran enhanced innochecksum on the ibd file after the initial insert, delete and then re-insert. After the initial insert, innochecksum reported block allocation in i.ibd as:
1555	FIL_PAGE_INDEX
490	FIL_PAGE_TYPE_ALLOCATED

After the delete:
1555	FIL_PAGE_INDEX
490	FIL_PAGE_TYPE_ALLOCATED

And after the re-insert:
1555	FIL_PAGE_INDEX
490	FIL_PAGE_TYPE_ALLOCATED

I created f-master.opt with:
--innodb-file-per-table

and used f.test with:

-- source include/have_innodb_plugin.inc

create table i(ic int primary key auto_increment) engine=innodb;
create table i2(ic int primary key auto_increment) engine=innodb;

insert into i2 values (1), (2), (3), (4);

let $x=12;
while ($x)
{
 eval insert into i2 select null from i2;
 dec $x;
}

let $x=64;
while ($x)
{
  eval insert into i select null from i2;
  dec $x;
}

select count(*) from i;
show table status;

echo Sleep before;
sleep 10;
delete from i where ic > 900000;
delete from i where ic > 800000;
delete from i where ic > 700000;
delete from i where ic > 600000;
delete from i where ic > 500000;
delete from i where ic > 400000;
delete from i where ic > 300000;
delete from i where ic > 200000;
delete from i where ic > 100000;

select count(*) from i;
echo Sleep after;
sleep 60;
show table status;

echo Insert more;

let $x=32;
while ($x)
{
  eval insert into i select null from i2;
  dec $x;
}

select count(*) from i;
echo Sleep after insert;
sleep 60;
show table status;
[8 Feb 2011 23:33] Mark Callaghan
I repeated the test without using file-per-table and it appears that the pages that should be free after the big delete remain allocated for table i. But they are eventually moved to other uses when needed. I changed to test to avoid allocating so much undo. ibdata1 did not grow after the initial insert so I think InnoDB did the right thing here.

After the initial insert of 1M rows:
1713	FIL_PAGE_INDEX
947	FIL_PAGE_TYPE_ALLOCATED

and pages by ID:
1676	15
26	16
1	17

After deleting 90% of the rows:
1446	FIL_PAGE_INDEX
363	FIL_PAGE_UNDO_LOG
869	FIL_PAGE_TYPE_ALLOCATED

and pages by ID show that most are still counted for the table with the inserts:
1409	15
26	16
1	17

After inserting ~1M into another table:
1815	FIL_PAGE_INDEX
82	FIL_PAGE_UNDO_LOG
784	FIL_PAGE_TYPE_ALLOCATED

and pages by ID finally shows pages taken away from the first table:
404	15
26	16
1375	17
[4 May 2011 18:01] Sveta Smirnova
Thank you for the report.

> ibdata1 did not grow after the initial insert so I think InnoDB did the right thing here.

Does this mean bug can be closed as "Not a Bug"/"Can't repeat"?
[4 May 2011 18:53] Mark Callaghan
yes, please close as "not a bug"
[4 May 2011 19:12] Sveta Smirnova
Closed, thanks!