Bug #33649 Is multi-byte encoding applied twice, leading to squared size ?
Submitted: 3 Jan 2008 12:35 Modified: 13 Nov 2008 3:19
Reporter: Joerg Bruehe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:6.0.4-alpha OS:Any
Assigned to: Sergei Glukhov CPU Architecture:Any
Triage: Triaged: D3 (Medium)

[3 Jan 2008 12:35] Joerg Bruehe
Description:
In the "*__datadict" tests in the 6.0.4 build
(innodb__datadict, memory__datadict, myisam__datadict),
I found a change (relative to the "result" file) which I attribute to the switch of "utf8" now being a 4-byte-per-char encoding (it was 3, previously).

However, it seems this change (3 -> 4) may have been applied twice, and I propose this be checked.

In "memory__datadict", the diff starts like this:
***************
*** 15395,15413
    AND table_name   = 'parameters'
  ORDER BY ordinal_position;
  TABLE_CATALOG TABLE_SCHEMA    TABLE_NAME      COLUMN_NAME     ORDINAL_POSITION        COLUMN_DEFAULT  IS_NULLABLE     DATA_TYPE       CHARACTER_MAXIMUM_LENGTH        CHARACTER_OCTET_LENGTH  NUMERIC_PRECISION       NUMERIC_SCALE   CHARACTER_SET_NAME      COLLATION_NAME  COLUMN_TYPE     COLUMN_KEY      EXTRA   PRIVILEGES      COLUMN_COMMENT  STORAGE FORMAT

For better readability, I have eliminated columns which seem to be irrelevant for this question, and so the condensed diff is:
(Columns "TABLE_CATALOG", "TABLE_SCHEMA", "TABLE_NAME"
 were displayed as "NULL", "information_schema", "parameters";
 columns "NUMERIC_PRECISION" and "NUMERIC_SCALE" have been removed,
 also "COLUMN_KEY", "EXTRA PRIVILEGES", "COLUMN_COMMENT", "STORAGE FORMAT"):

***************
*** 15395,15413
    AND table_name   = 'parameters'
  ORDER BY ordinal_position;
  COLUMN_NAME              ORDINAL_POSITION      CHARACTER_MAXIMUM_LENGTH          COLUMN_TYPE
                              COLUMN_DEFAULT           CHARACTER_OCTET_LENGTH
                                    IS_NULLABLE              CHARACTER_SET_NAME
                                         DATA_TYPE                 COLLATION_NAME
! SPECIFIC_CATALOG         1  NULL  YES  varchar 4096  12288 utf8  utf8_general_ci varchar(4096)
! SPECIFIC_SCHEMA          2        NO   varchar 192   576   utf8  utf8_general_ci varchar(192)
! SPECIFIC_NAME            3        NO   varchar 192   576   utf8  utf8_general_ci varchar(192)
  ORDINAL_POSITION         4  0     NO   int     NULL  NULL  NULL  NULL            int(21)
! PARAMETER_MODE           5  NULL  YES  varchar 5     15    utf8  utf8_general_ci varchar(5)
! PARAMETER_NAME           6  NULL  YES  varchar 192   576   utf8  utf8_general_ci varchar(192)
! DATA_TYPE                7        NO   varchar 192   576   utf8  utf8_general_ci varchar(192)
  CHARACTER_MAXIMUM_LENGTH 8  NULL  YES  int     NULL  NULL  NULL  NULL            int(21)
  CHARACTER_OCTET_LENGTH   9  NULL  YES  int     NULL  NULL  NULL  NULL            int(21)
  NUMERIC_PRECISION        10 NULL  YES  int     NULL  NULL  NULL  NULL            int(21)
--- 15668,15686
    AND table_name   = 'parameters'
  ORDER BY ordinal_position;
  COLUMN_NAME              ORDINAL_POSITION      CHARACTER_MAXIMUM_LENGTH          COLUMN_TYPE
                              COLUMN_DEFAULT           CHARACTER_OCTET_LENGTH
                                    IS_NULLABLE              CHARACTER_SET_NAME
                                         DATA_TYPE                 COLLATION_NAME
! SPECIFIC_CATALOG         1  NULL  YES  varchar 4096  2048  utf8  utf8_general_ci varchar(4096)
! SPECIFIC_SCHEMA          2        NO   varchar 256   1024  utf8  utf8_general_ci varchar(256)
! SPECIFIC_NAME            3        NO   varchar 256   1024  utf8  utf8_general_ci varchar(256)
  ORDINAL_POSITION         4  0     NO   int     NULL  NULL  NULL  NULL            int(21)
! PARAMETER_MODE           5  NULL  YES  varchar 5     20    utf8  utf8_general_ci varchar(5)
! PARAMETER_NAME           6  NULL  YES  varchar 256   1024  utf8  utf8_general_ci varchar(256)
! DATA_TYPE                7        NO   varchar 256   1024  utf8  utf8_general_ci varchar(256)
  CHARACTER_MAXIMUM_LENGTH 8  NULL  YES  int     NULL  NULL  NULL  NULL            int(21)
  CHARACTER_OCTET_LENGTH   9  NULL  YES  int     NULL  NULL  NULL  NULL            int(21)
  NUMERIC_PRECISION        10 NULL  YES  int     NULL  NULL  NULL  NULL            int(21)

Note that columns "SPECIFIC_SCHEMA", "SPECIFIC_NAME", "PARAMETER_NAME", and "DATA_TYPE" have had 
1) their "CHARACTER_MAXIMUM_LENGTH" increased by 4/3
   (192 = 64 * 3  ->  256 = 64 * 4)
*and* simultaneously
2) their "CHARACTER_OCTET_LENGTH" from 3 * "CHARACTER_MAXIMUM_LENGTH"
   to 4 * "CHARACTER_MAXIMUM_LENGTH".

I assume item 2) is necessary for their utf8 encoding,
but item 1) seems to indicate the column length already allows for multi-byte encoding.

How to repeat:
Found by looking at the test failure.

Suggested fix:
Check whether really both size increases are needed.
[8 Oct 2008 11:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55723

2859 Sergey Glukhov	2008-10-08
      Bug#33649 Is multi-byte encoding applied twice, leading to squared size ?
      Some columns are declared in a wrong
      way, which results in this double length multiplication.
      The correct character length should be 64, and the correct octet
      length should be 256.
      The fix is to use NAME_CHAR_LEN instead of NAME_LEN
[8 Oct 2008 11:24] Alexander Barkov
The patch http://lists.mysql.com/commits/55723 is ok to push.
[9 Oct 2008 10:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55903

2862 Sergey Glukhov	2008-10-09
      Bug#33649 Is multi-byte encoding applied twice, leading to squared size ?
      Some columns are declared in a wrong
      way, which results in this double length multiplication.
      The correct character length should be 64, and the correct octet
      length should be 256.
      The fix is to use NAME_CHAR_LEN instead of NAME_LEN
[10 Nov 2008 10:52] Bugs System
Pushed into 6.0.8-alpha  (revid:sergey.glukhov@sun.com-20081009101746-5cnojb55yibo2wpp) (version source revid:sergey.glukhov@sun.com-20081009101746-5cnojb55yibo2wpp) (pib:5)
[13 Nov 2008 3:19] Paul Dubois
Noted in 6.0.9 changelog.

The ROUTINES.DATA_TYPE, REFERENTIAL_CONSTRAINTS.SPECIFIC_SCHEMA,
REFERENTIAL_CONSTRAINTS.SPECIFIC_NAME,
REFERENTIAL_CONSTRAINTS.PARAMETER_NAME,
REFERENTIAL_CONSTRAINTS.DATA_TYPE columns were declared longer than
the maximum allowed identifier length.