| Bug #67963 | InnoDB wastes almost one extent out of every innodb_page_size pages | ||
|---|---|---|---|
| Submitted: | 22 Dec 2012 6:51 | Modified: | 2 Nov 2015 12:52 |
| Reporter: | Jeremy Cole (Basic Quality Contributor) (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
| Version: | All | OS: | Any |
| Assigned to: | Shaohua Wang | CPU Architecture: | Any |
| Tags: | extents, innodb, space, waste | ||
[22 Jan 2013 19:24]
Davi Arnaut
InnoDB wastes 62 of every 16,384 pages in XDES/IBUF_BITMAP extent (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: InnoDB-wastes-62-of-every-16-384-pages-in-XDES-IBUF.patch (application/octet-stream, text), 37.27 KiB.
[22 Jan 2013 19:44]
Matthew Lord
Thank you, Jeremy and Davi! I have marked this as verified and I will move the bug over to the InnoDB devs.
[22 Jan 2013 19:55]
Davi Arnaut
It would be nice if Bug#67718 and Bug#68023 could also be verified. They are more important and we have also contributed fixes for them.
[13 May 2014 23:20]
Jeremy Cole
(I updated the title of this bug to represent the reality.) Since MySQL 5.6 this can be effectively much worse. While with uncompressed tables and innodb_page_size=16k, this only wastes 62/16384 pages (~0.37%), neither of those numbers are fixed, in fact they are: e = 1048576 (extent size, sort of) s = 16384 (system page size, aka innodb_page_size) p = 16384 (actual page size, aka zip_size) ((e / s) - 2) / p So the original assertion of wasted space holds for the default configuration which wastes 0.37%: > e = 1048576 > s = 16384 > p = 16384 > ((e / s) - 2) / p [1] 0.00378418 With innodb_page_size=8k though, 1.5% is wasted: > e = 1048576 > s = 8192 > p = 8192 > ((e / s) - 2) / p [1] 0.01538086 And with innodb_page_size=4k, a full 6.2% is wasted: > e = 1048576 > s = 4096 > p = 4096 > ((e / s) - 2) / p [1] 0.06201172 With InnoDB compression, an "extent" (what is represented by an extend descriptor [XDES] page) is actually not 1048576 bytes as documented, but rather e/s *pages* of size p. (Normally that means 64 pages of size p, ranging from 1 MiB for 16k pages to only 64 KiB for 1k pages.) Therefore, in the worst possible case, with innodb_page_size=4k and a compressed table with key_block_size=1, an astounding 24.8% of the space is wasted by this bug: > e = 1048576 > s = 4096 > p = 1024 > ((e / s) - 2) / p [1] 0.2480469 And even with innodb_page_size=16k (the default) with compressed tables, this bug can waste up to 6% of the disk space: > e = 1048576 > s = 16384 > p = c(16384, 8192, 4096, 2048, 1024) > ((e / s) - 2) / p [1] 0.003784180 0.007568359 0.015136719 0.030273438 0.060546875
[13 May 2014 23:24]
Jeremy Cole
Illustration showing ~24.8% waste from innodb_page_size=4k with compressed table with 1k pages
Attachment: innodb_bug_67963_4k_1k.png (image/png, text), 75.74 KiB.
[2 Nov 2015 12:42]
Daniel Price
commit c20cf3b88fea910f08138e0717bb07f95c2e66aa
Author: Annamalai Gurusami <annamalai.gurusami@oracle.com>
Date: Fri Oct 30 11:29:16 2015 +0530
Bug #16204823 INNODB WASTES 62 OUT OF EVERY 16384 PAGES
Problem:
One extent descriptor page contains information about page allocations in N
number of extents. If page size is 16K (the default), then N = 256. Of these
extents 1 or more is used for fragment pages and such extents are of the state XDES_FREE_FRAG or XDES_FULL_FRAG. The remaining extents (XDES_FSEG) will
belong to any one segment.
Any segment can make use of the pages from the fragment extent. But each
segment can use only 32 of these fragment pages. Because of this most of the pages of these fragment extents are wasted by InnoDB.
Solution:
Make it possible for a segment to make use of these fragment extent. Introduce the new XDES_FSEG_FRAG state for the extent.
[2 Nov 2015 12:52]
Daniel Price
Posted by developer: Fixed as of the upcoming 5.8.0 release, and here's the changelog entry: InnoDB wasted most pages in extents used for fragment pages. Thank you for the bug report.

Description: Each 16384 pages, InnoDB must allocate two pages for bookkeeping: One XDES (extent descriptor) page, and one IBUF_BITMAP (insert buffer bitmap) page. In order to do so, an extent must be allocated starting at page 16384 (offset 256MiB), and that extent is allocated as a "FREE_FRAG" extent, meaning its pages can only be allocated as "fragment" pages individually. So each 16384 pages, InnoDB allocates an extent, allocates and initialized these two bookkeeping pages, and puts the extent in the FREE_FRAG list, with 62 "free" pages remaining. File segments (FSEGs) in tablespaces (of which there are two per index) may use individual fragment pages from extents in the FREE_FRAG list only until they reach 32 allocated pages, as they use a fixed-width array of 32 items to track which fragment pages are allocated. Once an FSEG has reached 32 pages, only whole extents are allocated to it. This means that in practice, there is very little use for large numbers of FREE_FRAG pages. Within a file-per-table tablespace, the maximum number of pages usable from FREE_FRAG (excluding the XDES and IBUF_BITMAP pages themselves) is the number of indexes times 32 times 2., and in typical tables with 1-4 indexes, this would be 64-256 pages in total, regardless of the size of the table. Nonetheless for each 256MiB, an additional 62 pages are added to FREE_FRAG which will go unused forever. This equates to slightly less than 0.37% wasted space in any table. How to repeat: 1. Create a table which will exceed 256MiB (and preferably is much larger); this Ruby script will create a table of approximately 1.1GiB: m = Mysql.new("127.0.0.1", "root", "", "test", 13000) m.query("CREATE TABLE t (i INT UNSIGNED NOT NULL, s VARCHAR(1000) NOT NULL) ENGINE=InnoDB") (1..1000000).each { |n| m.query("INSERT INTO t (i, s) VALUES (#{n}, REPEAT('A', 1000))"); puts "Inserted #{n}" if n % 5000 == 0 } 2. Examine the created tablespace file (I am using innodb_space from innodb_ruby): $ innodb_space -f t.ibd space-page-type-regions start end count type 0 0 1 FSP_HDR 1 1 1 IBUF_BITMAP 2 2 1 INODE 3 16383 16381 INDEX 16384 16384 1 XDES 16385 16385 1 IBUF_BITMAP 16386 16388 3 INDEX 16389 16447 59 ALLOCATED 16448 32101 15654 INDEX 32102 32127 26 ALLOCATED 32128 32767 640 INDEX 32768 32768 1 XDES 32769 32769 1 IBUF_BITMAP 32770 32831 62 ALLOCATED 32832 49151 16320 INDEX 49152 49152 1 XDES 49153 49153 1 IBUF_BITMAP 49154 49215 62 ALLOCATED 49216 65535 16320 INDEX 65536 65536 1 XDES 65537 65537 1 IBUF_BITMAP 65538 65599 62 ALLOCATED 65600 71781 6182 INDEX 71782 72959 1178 ALLOCATED A repeating pattern every 16384 pages, will be obvious, e.g.: start end count type 32768 32768 1 XDES 32769 32769 1 IBUF_BITMAP 32770 32831 62 ALLOCATED Those 62 ALLOCATED pages are the wasted pages discussed. It is also possible to examine the FREE_FRAG list itself: $ innodb_space -f t.ibd list-summary -L free_frag start_page bitmap 16384 #####........................................................... 32768 ##.............................................................. 49152 ##.............................................................. 65536 ##.............................................................. (Here the bitmap column shows page usage within an extent on the FREE_FRAG list, and "#" indicates a page that is used, while "." indicates a free page.)