Bug #67963 InnoDB wastes almost one extent out of every innodb_page_size pages
Submitted: 22 Dec 2012 6:51 Modified: 2 Nov 2015 12:52
Reporter: Jeremy Cole (Basic Quality Contributor) (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:All OS:Any
Assigned to: Shaohua Wang CPU Architecture:Any
Tags: extents, innodb, space, waste

[22 Dec 2012 6:51] Jeremy Cole
Description:
Each 16384 pages, InnoDB must allocate two pages for bookkeeping: One XDES (extent descriptor) page, and one IBUF_BITMAP (insert buffer bitmap) page. In order to do so, an extent must be allocated starting at page 16384 (offset 256MiB), and that extent is allocated as a "FREE_FRAG" extent, meaning its pages can only be allocated as "fragment" pages individually. So each 16384 pages, InnoDB allocates an extent, allocates and initialized these two bookkeeping pages, and puts the extent in the FREE_FRAG list, with 62 "free" pages remaining.

File segments (FSEGs) in tablespaces (of which there are two per index) may use individual fragment pages from extents in the FREE_FRAG list only until they reach 32 allocated pages, as they use a fixed-width array of 32 items to track which fragment pages are allocated. Once an FSEG has reached 32 pages, only whole extents are allocated to it. This means that in practice, there is very little use for large numbers of FREE_FRAG pages.

Within a file-per-table tablespace, the maximum number of pages usable from FREE_FRAG (excluding the XDES and IBUF_BITMAP pages themselves) is the number of indexes times 32 times 2., and in typical tables with 1-4 indexes, this would be 64-256 pages in total, regardless of the size of the table. Nonetheless for each 256MiB, an additional 62 pages are added to FREE_FRAG which will go unused forever. This equates to slightly less than 0.37% wasted space in any table.

How to repeat:
1. Create a table which will exceed 256MiB (and preferably is much larger); this Ruby script will create a table of approximately 1.1GiB:

m = Mysql.new("127.0.0.1", "root", "", "test", 13000)

m.query("CREATE TABLE t (i INT UNSIGNED NOT NULL, s VARCHAR(1000) NOT NULL) ENGINE=InnoDB")

(1..1000000).each { |n| m.query("INSERT INTO t (i, s) VALUES (#{n}, REPEAT('A', 1000))"); puts "Inserted #{n}" if n % 5000 == 0 }

2. Examine the created tablespace file (I am using innodb_space from innodb_ruby):

$ innodb_space -f t.ibd space-page-type-regions
start       end         count       type                
0           0           1           FSP_HDR             
1           1           1           IBUF_BITMAP         
2           2           1           INODE               
3           16383       16381       INDEX               
16384       16384       1           XDES                
16385       16385       1           IBUF_BITMAP         
16386       16388       3           INDEX               
16389       16447       59          ALLOCATED           
16448       32101       15654       INDEX               
32102       32127       26          ALLOCATED           
32128       32767       640         INDEX               
32768       32768       1           XDES                
32769       32769       1           IBUF_BITMAP         
32770       32831       62          ALLOCATED           
32832       49151       16320       INDEX               
49152       49152       1           XDES                
49153       49153       1           IBUF_BITMAP         
49154       49215       62          ALLOCATED           
49216       65535       16320       INDEX               
65536       65536       1           XDES                
65537       65537       1           IBUF_BITMAP         
65538       65599       62          ALLOCATED           
65600       71781       6182        INDEX               
71782       72959       1178        ALLOCATED           

A repeating pattern every 16384 pages, will be obvious, e.g.:

start       end         count       type                
32768       32768       1           XDES                
32769       32769       1           IBUF_BITMAP         
32770       32831       62          ALLOCATED           

Those 62 ALLOCATED pages are the wasted pages discussed.

It is also possible to examine the FREE_FRAG list itself:

$ innodb_space -f t.ibd list-summary -L free_frag
start_page  bitmap                                                          
16384       #####...........................................................
32768       ##..............................................................
49152       ##..............................................................
65536       ##..............................................................

(Here the bitmap column shows page usage within an extent on the FREE_FRAG list, and "#" indicates a page that is used, while "." indicates a free page.)
[22 Jan 2013 19:24] Davi Arnaut
InnoDB wastes 62 of every 16,384 pages in XDES/IBUF_BITMAP extent

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: InnoDB-wastes-62-of-every-16-384-pages-in-XDES-IBUF.patch (application/octet-stream, text), 37.27 KiB.

[22 Jan 2013 19:44] Matthew Lord
Thank you, Jeremy and Davi!

I have marked this as verified and I will move the bug over to the InnoDB devs.
[22 Jan 2013 19:55] Davi Arnaut
It would be nice if Bug#67718 and Bug#68023 could also be verified. They are more important and we have also contributed fixes for them.
[13 May 2014 23:20] Jeremy Cole
(I updated the title of this bug to represent the reality.)

Since MySQL 5.6 this can be effectively much worse. While with uncompressed tables and innodb_page_size=16k, this only wastes 62/16384 pages (~0.37%), neither of those numbers are fixed, in fact they are:

e = 1048576 (extent size, sort of)
s = 16384 (system page size, aka innodb_page_size)
p = 16384 (actual page size, aka zip_size)

((e / s) - 2) / p

So the original assertion of wasted space holds for the default configuration which wastes 0.37%:

> e = 1048576
> s = 16384
> p = 16384
> ((e / s) - 2) / p
[1] 0.00378418

With innodb_page_size=8k though, 1.5% is wasted:

> e = 1048576
> s = 8192
> p = 8192
> ((e / s) - 2) / p
[1] 0.01538086

And with innodb_page_size=4k, a full 6.2% is wasted:

> e = 1048576
> s = 4096
> p = 4096
> ((e / s) - 2) / p
[1] 0.06201172

With InnoDB compression, an "extent" (what is represented by an extend descriptor [XDES] page) is actually not 1048576 bytes as documented, but rather e/s *pages* of size p. (Normally that means 64 pages of size p, ranging from 1 MiB for 16k pages to only 64 KiB for 1k pages.)

Therefore, in the worst possible case, with innodb_page_size=4k and a compressed table with key_block_size=1, an astounding 24.8% of the space is wasted by this bug:

> e = 1048576
> s = 4096
> p = 1024
> ((e / s) - 2) / p
[1] 0.2480469

And even with innodb_page_size=16k (the default) with compressed tables, this bug can waste up to 6% of the disk space:

> e = 1048576
> s = 16384
> p = c(16384, 8192, 4096, 2048, 1024)
> ((e / s) - 2) / p
[1] 0.003784180 0.007568359 0.015136719 0.030273438 0.060546875
[13 May 2014 23:24] Jeremy Cole
Illustration showing ~24.8% waste from innodb_page_size=4k with compressed table with 1k pages

Attachment: innodb_bug_67963_4k_1k.png (image/png, text), 75.74 KiB.

[2 Nov 2015 12:42] Daniel Price
commit c20cf3b88fea910f08138e0717bb07f95c2e66aa
Author: Annamalai Gurusami <annamalai.gurusami@oracle.com>
Date:   Fri Oct 30 11:29:16 2015 +0530

Bug #16204823 INNODB WASTES 62 OUT OF EVERY 16384 PAGES

Problem:
    
One extent descriptor page contains information about page allocations in N
number of extents. If page size is 16K (the default), then N = 256. Of these
extents 1 or more is used for fragment pages and such extents are of the    state XDES_FREE_FRAG or XDES_FULL_FRAG.  The remaining extents (XDES_FSEG) will
belong to any one segment.
    
Any segment can make use of the pages from the fragment extent.  But each
segment can use only 32 of these fragment pages.  Because of this most of the pages of these fragment extents are wasted by InnoDB.
    
Solution:
    
Make it possible for a segment to make use of these fragment extent.  Introduce the new XDES_FSEG_FRAG state for the extent.
[2 Nov 2015 12:52] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.8.0 release, and here's the changelog entry:

InnoDB wasted most pages in extents used for fragment pages.

Thank you for the bug report.