Bug #33600 CHARACTER_OCTET_LENGTH is now CHARACTER_MAXIMUM_LENGTH * 4
Submitted: 31 Dec 2007 17:07 Modified: 28 Mar 2008 19:23
Reporter: Joerg Bruehe Email Updates:
Status: Closed Impact on me:
None 
Category:Tests: Server Severity:S3 (Non-critical)
Version:6.0.4-alpha OS:Any
Assigned to: Matthias Leich CPU Architecture:Any

[31 Dec 2007 17:07] Joerg Bruehe
Description:
In the "<engine>__datadict" tests
(innodb__datadict, memory__datadict, myisam__datadict),
the relation between CHARACTER_OCTET_LENGTH and CHARACTER_MAXIMUM_LENGTH has changed from a factor of 3 to one of 4
for (some or all ?) columns of "utf8" character set.

Assuming this is the consequence of changed "utf8" handling, this may "simply" be a case of lacking "result" file update.

The following is just a very short sample, of course all "*__datadict" tests are affected and the differences are much longer:

--- /PATH/mysql-test/suite/funcs_1/r/innodb__datadict.result
+++ /PATH/mysql-test/suite/funcs_1/r/innodb__datadict.reject
@@ -2268,87 +2268,87 @@
 NULL   test4   latin1  VIEW    NULL
 select * from columns;
 TABLE_CATALOG  TABLE_SCHEMA    TABLE_NAME      COLUMN_NAME     ORDINAL_POSITION        COLUMN_DEFAULT  IS_NULLABLE     DATA_TYPE       CHARACTER_MAXIMUM_LENGTH        CHARACTER_OCTET_LENGTH  NUMERIC_PRECISION       NUMERIC_SCALE   CHARACTER_SET_NAME      COLLATION_NAME  COLUMN_TYPE     COLUMN_KEY      EXTRA   PRIVILEGES      COLUMN_COMMENT  STORAGE FORMAT
-NULL   information_schema      CHARACTER_SETS  CHARACTER_SET_NAME      1               NO      varchar 64      192     NULL    NULL    utf8    utf8_general_ci varchar(64)                     select          Default Default
-NULL   information_schema      CHARACTER_SETS  DEFAULT_COLLATE_NAME    2               NO      varchar 64      192     NULL    NULL    utf8    utf8_general_ci varchar(64)                     select          Default Default
-NULL   information_schema      CHARACTER_SETS  DESCRIPTION     3               NO      varchar 60      180     NULL    NULL    utf8    utf8_general_ci varchar(60)                     select          Default Default
+NULL   information_schema      CHARACTER_SETS  CHARACTER_SET_NAME      1               NO      varchar 64      256     NULL    NULL    utf8    utf8_general_ci varchar(64)                     select          Default Default
+NULL   information_schema      CHARACTER_SETS  DEFAULT_COLLATE_NAME    2               NO      varchar 64      256     NULL    NULL    utf8    utf8_general_ci varchar(64)                     select          Default Default
+NULL   information_schema      CHARACTER_SETS  DESCRIPTION     3               NO      varchar 60      240     NULL    NULL    utf8    utf8_general_ci varchar(60)                     select          Default Default
 NULL   information_schema      CHARACTER_SETS  MAXLEN  4       0       NO      bigint  NULL    NULL    19      0       NULL    NULL    bigint(3)                       select          Default Default
 .....

How to repeat:
Run the "funcs_1" tests.

Suggested fix:
Make it mandatory for developers to run these tests,
so that they get maintained with every code change 
(and not just after release builds).
[2 Jan 2008 16:20] Joerg Bruehe
Further analysis of the test logs shows this difference, which I assume to be the reason of the change:

@@ -3179,7 +3179,7 @@
 gbk    gbk_chinese_ci  GBK Simplified Chinese  2
 latin5 latin5_turkish_ci       ISO 8859-9 Turkish      1
 armscii8       armscii8_general_ci     ARMSCII-8 Armenian      1
-utf8   utf8_general_ci UTF-8 Unicode   3
+utf8mb3        utf8mb3_general_ci      UTF-8 Unicode   3
 ucs2   ucs2_general_ci UCS-2 Unicode   2
 cp866  cp866_general_ci        DOS Russian     1
 keybcs2        keybcs2_general_ci      DOS Kamenicky Czech-Slovak      1
@@ -3187,9 +3187,12 @@
 macroman       macroman_general_ci     Mac West European       1
 cp852  cp852_general_ci        DOS Central European    1
 latin7 latin7_general_ci       ISO 8859-13 Baltic      1
+utf8   utf8_general_ci UTF-8 Unicode   4
 cp1251 cp1251_general_ci       Windows Cyrillic        1
+utf16  utf16_general_ci        UTF-16 Unicode  4
 cp1256 cp1256_general_ci       Windows Arabic  1
 cp1257 cp1257_general_ci       Windows Baltic  1
+utf32  utf32_general_ci        UTF-32 Unicode  4
 binary binary  Binary pseudo charset   1
 geostd8        geostd8_general_ci      GEOSTD8 Georgian        1
 cp932  cp932_japanese_ci       SJIS for Windows Japanese       2

So "utf8" is now a 4-byte-per-char encoding, and having such encodings is new:

@@ -5751,7 +5754,7 @@
 863
 select max(maxlen) as the_max from character_sets;
 the_max
-3
+4
 select * from collations order by id asc  limit 0, 5;
 COLLATION_NAME CHARACTER_SET_NAME      ID      IS_DEFAULT      IS_COMPILED     SORTLEN
 big5_chinese_ci        big5    1       Yes     Yes     1

Lastly, this one:

@@ -12460,12 +12733,12 @@
 2.0000 longtext        ucs2    ucs2_general_ci
 2.0000 varchar ucs2    ucs2_general_ci
 2.0079 tinytext        ucs2    ucs2_general_ci
-3.0000 char    utf8    utf8_bin
-3.0000 enum    utf8    utf8_bin
-3.0000 char    utf8    utf8_general_ci
-3.0000 enum    utf8    utf8_general_ci
-3.0000 set     utf8    utf8_general_ci
-3.0000 varchar utf8    utf8_general_ci
+4.0000 char    utf8    utf8_bin
+4.0000 enum    utf8    utf8_bin
+4.0000 char    utf8    utf8_general_ci
+4.0000 enum    utf8    utf8_general_ci
+4.0000 set     utf8    utf8_general_ci
+4.0000 varchar utf8    utf8_general_ci
 SELECT DISTINCT
 CHARACTER_OCTET_LENGTH / CHARACTER_MAXIMUM_LENGTH AS COL_CML,
 DATA_TYPE,
[11 Mar 2008 13:19] Matthias Leich
WL#4203 Reorganize and fix the data dictionary tests of
        testsuite funcs_1
was pushed to mysql-<version>-build
Version is in (5.0,5.1,6.0).

1. The tests "<engine>__datadict" do no more exist.
2. The checks where the problem above occured were
   moved into the new test "is_columns_...".
3. Files with expected results were generated for
   MySQLÖ 5.0, 5.1 and 6.0.
[27 Mar 2008 22:03] Bugs System
Pushed into 5.1.24-rc
[27 Mar 2008 22:11] Bugs System
Pushed into 5.0.60
[28 Mar 2008 11:10] Bugs System
Pushed into 6.0.5-alpha
[28 Mar 2008 19:23] Paul DuBois
Fix involves test case changes. No changelog entry needed.