Bug #57980 | InnoDB may leak off-page columns (BLOBs) when updating them | ||
---|---|---|---|
Submitted: | 4 Nov 2010 13:38 | Modified: | 8 Nov 2010 8:18 |
Reporter: | Marko Mäkelä | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | all | OS: | Any |
Assigned to: | Marko Mäkelä | CPU Architecture: | Any |
Tags: | BLOB, Leak, off-page |
[4 Nov 2010 13:38]
Marko Mäkelä
[5 Nov 2010 2:55]
Roel Van de Paar
Hi Marko, Not sure if this is what you are looking for, but it seems that your assumption above is correct. However, *it does the same ("waste space") if the PK is not updated at all*, unless TRUNCATE is used (I tested and confirmed the same once more using an adapted testcase instead of the 50% PK update test which is listed below). It also seems to re-use the space after the delete (even if the PK was updated). In any case, you can likely adapt the testcase below to test what you need to test. Setting to verified ftm. This was tested on 5.5.6rc. =================== Testcase (set innodb_file_per_table) =================== DELIMITER // DROP PROCEDURE IF EXISTS fillt1// DROP TABLE IF EXISTS t1// CREATE TABLE `t1` (`16kb` varchar(16384),`pk` int,`offpage` varchar(1024), PRIMARY KEY(`pk`)) ENGINE=InnoDB ROW_FORMAT=DYNAMIC// CREATE PROCEDURE fillt1(IN numrows INT) BEGIN DECLARE count INT; SET count = 1; WHILE (count <= numrows) DO INSERT INTO t1 VALUES (repeat('A',16*1024),count,repeat('B',1*1024)); SET count=count+1; END WHILE; END; // DELIMITER ; CALL fillt1(10000); /* check filesize of t1.ibd - 402,653,184 bytes */ UPDATE `t1` SET pk=pk+20000; /* check filesize of t1.ibd - 415,236,096 bytes - file has become larger due to pk change */ DELETE FROM `t1` WHERE pk>1; SELECT COUNT(*) FROM `t1`; /* 0 rows */ /* check filesize of t1.ibd - 415,236,096 bytes - file size remains unchanged from previous check even though rows were deleted */ CALL fillt1(10000); /* check filesize of t1.ibd - 415,236,096 bytes - file size remains unchanged form previous check (i.e. space is re-used) */ UPDATE `t1` SET pk=pk+2000000; /* check filesize of t1.ibd - 415,236,096 bytes - file size remains unchanged form previous check */ CALL fillt1(10000); /* 10000 additional records */ /* check filesize of t1.ibd - 796,917,760 bytes */ DELETE FROM `t1` WHERE pk>=1; SELECT COUNT(*) FROM `t1`; /* 0 rows */ /* check filesize of t1.ibd - 796,917,760 bytes - file size remains unchanged from previous check even though rows were deleted (though only half the records were PK modified! - and it is the same btw if no PK records are modified at all) */ FLUSH TABLES; /* Restart server */ /* check filesize of t1.ibd - 796,917,760 bytes - unchanged */ TRUNCATE `t1`; /* check filesize of t1.ibd - 98,304 bytes - space released */ =================== End Testcase ===========================================
[5 Nov 2010 3:22]
Roel Van de Paar
The exact same thing happens for; CREATE TABLE `t1` (`16kb` blob,`pk` int,`offpage` varchar(1024), PRIMARY KEY(`pk`)) ENGINE=InnoDB ROW_FORMAT=DYNAMIC// And for; CREATE TABLE `t1` (`16kb` blob,`pk` int,`offpage` blob, PRIMARY KEY(`pk`)) ENGINE=InnoDB ROW_FORMAT=DYNAMIC// INSERT INTO t1 VALUES (repeat('A',16*1024),count,repeat('B',16*1024)); But then again, it also happens for a simple table with just an integer field and no pk: add records => delete them all => (space is retained) => truncate => (space is released). Reverting to open ftm. Hopefully the testcase above can be modded to suit your needs if there is a bug. Interestingly, it took a long time to stop the mysql service after that last blob/blob test above. c:\mysql556rc\bin>net stop mysql556rc The mysql556rc service is stopping............................... The mysql556rc service was stopped successfully.
[5 Nov 2010 3:42]
Roel Van de Paar
http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-other-changes-truncate.html
[5 Nov 2010 4:02]
Roel Van de Paar
5.5.6rc uses the InnoDB plugin (which releases space on TRUNCATE), and so should test on 5.1 instead. Tested on 5.1 and result is different. Using part of the testcase above with this changed: CREATE TABLE `t1` (`16kb` blob,`pk` int,`offpage` blob, PRIMARY KEY(`pk`)) ENGINE=InnoDB ROW_FORMAT=DYNAMIC// And INSERT INTO t1 VALUES (repeat('A',16*1024),count,repeat('B',1*1024)); Result: mysql> TRUNCATE `t1`; /* This truncate executed directly after the UPDATE pk+20000 and the DELETE */ Query OK, 0 rows affected (0.11 sec) So, the space is retained: 05/11/2010 02:47 PM 8,622 t1.frm 05/11/2010 02:49 PM 243,269,632 t1.ibd However, the same is again true for a simple table with only an integer column and no pk. Next test was to do a DELETE of all rows, then insert all rows (10000) again. BINGO: After the delete, but before the re-insert: 05/11/2010 02:55 PM 8,622 t1.frm 05/11/2010 02:57 PM 243,269,632 t1.ibd After the re-insert: 05/11/2010 02:55 PM 8,622 t1.frm 05/11/2010 02:58 PM 251,658,240 t1.ibd The space has increased by 8,388,608 bytes and 10000 * 1024 (the blob insert) = 10,240,000. So it looks like some space is indeed retained. Shall upload a cleaned up testcase soon.
[5 Nov 2010 5:06]
Roel Van de Paar
New testcase. There is something really odd going on here. It looks like if this testcase is run slowly or fastly, it makes a difference in the result (shows the bug or not) and the resulting filesizes are also different at different times or when run in different databases - very odd. I've seen it work and not work. Maybe there is some cleaner process going on in the background? You may have to test a few times in different ways with the testcase below (on 5.1 using the built-in InnoDB). Running in simultaneously in two sessions may help also. Am heading out for the day, but I've seen enough oddities to put this bug back to verified. ========================= New Testcase (Turn innodb_file_per_table on) DELIMITER // DROP PROCEDURE IF EXISTS fillt1// DROP TABLE IF EXISTS t1// CREATE TABLE `t1` (`16kb` blob,`pk` int,`offpage` blob, PRIMARY KEY(`pk`)) ENGINE=InnoDB ROW_FORMAT=DYNAMIC// CREATE PROCEDURE fillt1(IN numrows INT) BEGIN DECLARE count INT; SET count = 1; WHILE (count <= numrows) DO INSERT INTO t1 VALUES (repeat('A',16*1024),count,repeat('B',1*1024)); SET count=count+1; END WHILE; END; // DELIMITER ; CALL fillt1(10000); DELETE FROM `t1` WHERE pk>=1; SELECT COUNT(*) FROM `t1`; /* 0 rows */ CALL fillt1(10000); /* Check size of t1.ibd */ UPDATE `t1` SET pk=pk+20000; /* Check size of t1.ibd (increased due to pk change) */ DELETE FROM `t1` WHERE pk>=1; CALL fillt1(10000); /* Check size of t1.ibd */ =========================
[8 Nov 2010 7:34]
Marko Mäkelä
Sorry, I should have mentioned that this bug (if it exists) should only be visible after the delete-marked rows have been purged, that is, all transactions that started before the deletion have completed, and the purge queue is empty. Based on the comments, I am not sure if this bug has really been verified, or if it just normal purge lag is being observed. I will test this myself.
[8 Nov 2010 8:18]
Marko Mäkelä
Sorry for wasting your time, Roel. This is not a bug. I ran your test script, finally did DELETE FROM t1, and let the purge thread finish. Then I complied the code with UNIV_BTR_PRINT, restarted, set a breakpoint on row_search_for_mysql, and executed SELECT * FROM t1. When the breakpoint was reached, I did call btr_print_size(prebuilt->index) and got this output on stderr: INFO OF THE NON-LEAF PAGE SEGMENT SEGMENT id 1 space 1; page 2; res 1 used 1; full ext 0 fragm pages 1; free extents 0; not full extents 0: pages 0 INFO OF THE LEAF PAGE SEGMENT SEGMENT id 2 space 1; page 2; res 0 used 0; full ext 0 fragm pages 0; free extents 0; not full extents 0: pages 0 The BLOB pages would be allocated in the leaf page segment, but it shows ‘used 0’. Only one page (the root page of the clustered index) is allocated in the file segment reserved for non-leaf pages. (In this special case when the index tree consists of only one page, the root page actually is a leaf page, although it is always allocated in the same file segment.) Then I reran your test with a small modification: creating a secondary index(`16kb`(10)). It did not change the picture. Note that SHOW TABLE STATUS does not report everything in Data_free. I guess it excludes the InnoDB system pages (allocation overhead) from the count.
[8 Nov 2010 8:24]
Roel Van de Paar
Hi Marko, No worries, good to see you got to the bottom of this and hope the testcase helped to make your testing shorter!
[9 Nov 2010 14:23]
Marko Mäkelä
I had another look at this and a short discussion with Heikki. The test case does trigger row_upd_clust_rec_by_insert(), where the record is inserted with the BTR_EXTERN_INHERIT_FLAG set. The reason why no pages are leaked is that the only place where we care about the BTR_EXTERN_INHERIT_FLAG is when rolling back a transaction. That is, in rollback, btr_free_externally_stored_field() would refuse to free inherited BLOBs, because they would still be owned by an earlier version of the record. That is perfectly OK. The BLOB would be freed in the purge of a delete-marked record that no longer exists in any active transaction’s read view, provided that the BLOB is owned by the record. The ownership would be removed when the record is updated, but some BLOBs are inherited by (not updated in) the new version of the record.