Bug #71965 Storage requirements for utf8mb4 is not clear
Submitted: 7 Mar 2014 9:12 Modified: 3 Apr 2014 19:15
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.6.16 OS:Any
Assigned to: Daniel Price CPU Architecture:Any

[7 Mar 2014 9:12] Daniël van Eeden

To save space with UTF-8, use VARCHAR instead of CHAR. Otherwise, MySQL must reserve three (or four) bytes for each character in a CHAR CHARACTER SET utf8 (or utf8mb4) column because that is the maximum possible length. For example, MySQL must reserve 40 bytes for a CHAR(10) CHARACTER SET utf8mb4 column.

This tip clearly also applies to utf8, but is not metioned there:

"Internally, InnoDB attempts to store UTF-8 CHAR(N) columns in N bytes by trimming trailing spaces. (With REDUNDANT row format, such columns occupy 3 × N bytes.) Reserving the minimum space N in many cases enables column updates to be done in place without causing fragmentation of the index page."

How to repeat:
See description.

Suggested fix:
On https://dev.mysql.com/doc/refman/5.6/en/charset-unicode-utf8.html:

Link to https://dev.mysql.com/doc/refman/5.6/en/storage-requirements.html
And maybe also link to https://dev.mysql.com/doc/refman/5.6/en/innodb-table-and-index.html#innodb-physical-record

On https://dev.mysql.com/doc/refman/5.6/en/charset-unicode-utf8.html:

Add the tip given for utf8mb4.
[7 Mar 2014 12:36] MySQL Verification Team
Thank you for the bug report.
[3 Apr 2014 19:15] Daniel Price
The tip and related references have been added to the UTF-8 documentation as recommended. The revised content will appear soon, with the next published documentation build.

Thank you for the bug report.