Bug #70369 | Undocummented behavior of InnoDB tables using CHAR data type | ||
---|---|---|---|
Submitted: | 17 Sep 2013 16:07 | Modified: | 3 Jan 2014 15:35 |
Reporter: | Przemyslaw Malkowski | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Documentation | Severity: | S3 (Non-critical) |
Version: | 5.5 5.6 | OS: | Any |
Assigned to: | Daniel Price | CPU Architecture: | Any |
Tags: | InnnoDB CHAR |
[17 Sep 2013 16:07]
Przemyslaw Malkowski
[17 Sep 2013 17:05]
MySQL Verification Team
Last points in this page gives some clues as to what happens: http://dev.mysql.com/doc/refman/5.5/en/innodb-physical-record.html
[17 Sep 2013 17:18]
Jeremy Cole
This is more subtle than it would perhaps seem. The following two cases occur: * CHAR(100) CHARSET latin1: Stored as fixed-width field (or CHAR(100)), with length 100, and end-space padded. * CHAR(100) CHARSET utf8: Stored as variable-width field (or VARCHAR(100)). End-space padded up to at least 100 bytes, but may store up to 300 bytes in total. For the two cases you used (value "abc" and value "ऊऊऊ") the storage required for CHAR(100) CHARSET utf8 is exactly the same. However, the value "abc" (3 bytes) is padded by 97 bytes, and "ऊऊऊ" (9 bytes) is padded by 91 bytes. InnoDB is internally making an optimization here to store a CHAR(100) in the utf8 character set as a "pseudo fixed width" field. This could probably be documented a bit better.
[17 Sep 2013 17:27]
Jeremy Cole
By the way, using your table sizes it is easy to verify that this case is closer to storing the minimum allocation for either string than the maximum: 146800640 bytes / 1001000 rows => 146.654 bytes/row This comes from the 100 bytes for val1, plus 26 bytes of per-row overhead (including a 6-byte implicit clustered primary key), plus ~20 bytes of tablespace overhead/reservation/waste.
[17 Sep 2013 17:42]
Valeriy Kravchuk
So, space usage is actually efficient, but it should be clearly documented, IMHO.
[18 Sep 2013 14:11]
Przemyslaw Malkowski
Thank you Jeremy for excellent explanation. The fact that InnoDB uses optimization to store utf-8 CHAR similar way as it's storing VARCHAR explains a lot. By stating the initial title as InnoDB being inefficient, I actually based it on my earlier tests on smaller tables, but I end up showing an example of larger table where it is showing actual efficiency for this specific case as Jeremy noticed. My bad. So, below another example, with smaller table and smaller CHAR column width. Let's say I have CHAR(10) column, and I fill it with all 10 characters, 1 byte/char for both latin1 and uft8 plus 3 bytes/char for second utf8 table. In MyISAM table both .MYD file size and Data_length makes exactly 31 bytes per row in utf8 table regardless of characters stored. In InnoDB, after I fill these tables with just 100k rows: * when table is utf-8 and I fill CHAR(10) column with single-bye characters - Data_length shows ~47 bytes/row .ibd size makes ~125 bytes/row * when table is latin1 and I fill CHAR(10) column with characters - Data_length shows again ~47 bytes/row .ibd size makes also ~125 bytes/row * when table is utf-8 and I fill CHAR(10) column with three-bytes characters - Data_length shows ~68 bytes/row .ibd size makes ~146 bytes/row So the conclusion is that given only the Data_length statistics, we have about ~37 bytes overhead per row. But given the result file size, it's already ~115 bytes overhead per single CHAR(10) row. This makes an example utf-8 InnoDB table, filled with 100k rows of single-byte characters size at 12582912 bytes vs the same table as MyISAM is 3100000 bytes. 4 times bigger despite the fact InnoDB uses optimization for multi-byte charset here. For 1M rows, it's 30MB MyISAM vs 48MB InnoDB. Even for 10M rows it's still 296MB vs 388MB. The clear advantage of InnoDB over MyISAM is when 3 characters are stored per CHAR(100) column - 288MB MyISAM table vs 140MB InnoDB. The different storage usage behaviour in InnoDB should be docummented, also here I think: http://dev.mysql.com/doc/refman/5.5/en/char.html.
[18 Sep 2013 14:23]
Przemyslaw Malkowski
Changing title to more appropriate.
[19 Sep 2013 8:49]
MySQL Verification Team
Hello Przemyslaw, Thank you for the report. Thanks, Umesh
[3 Jan 2014 15:35]
Daniel Price
The Data Storage Requirements section (http://dev.mysql.com/doc/refman/5.5/en/storage-requirements.html) was missing references to InnoDB-specific information about CHAR data type storage requirements for InnoDB tables, found here: http://dev.mysql.com/doc/refman/5.5/en/innodb-table-and-index.html. References have been added. This section also provides information about InnoDB physical row structure (and "overhead" storage costs). Also added a reference to the data type storage requirements section from http://dev.mysql.com/doc/refman/5.5/en/char.html. Thank you for the bug report.