Bug #34391 Character sets: crash if char(), utf32, innodb
Submitted: 7 Feb 2008 19:27 Modified: 27 Jul 2010 23:38
Reporter: Peter Gulutzan Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:6.0.5-alpha-debug OS:Linux (SUSE 10 64-bit)
Assigned to: Assigned Account CPU Architecture:Any

[7 Feb 2008 19:27] Peter Gulutzan
Description:
I try to create a table.
The engine is InnoDB (no problem with other engines).
The data type is CHAR (no problem with VARCHAR).
The character set is UTF32 (no problem with UTF8).
Crash.

How to repeat:
mysql> create table t (s1 char(1) character set utf32) engine=innodb;
ERROR 2013 (HY000): Lost connection to MySQL server during query
[7 Feb 2008 20:26] MySQL Verification Team
Thank you for the bug report.

Program received signal SIGABRT, Aborted.
[Switching to Thread 1158191440 (LWP 19785)]
0x0000003bfca30ec5 in raise () from /lib64/libc.so.6
(gdb) bt full
#0  0x0000003bfca30ec5 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003bfca32970 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x0000000000a801a3 in dtype_get_min_size_low (mtype=13, prtype=3932414, len=4, mbminlen=0, mbmaxlen=4)
    at ../../storage/innobase/include/data0type.ic:499
No locals.
#3  0x0000000000a80a73 in dict_col_get_min_size (col=0x2aaaaaff10c0) at ../../storage/innobase/include/dict0dict.ic:66
No locals.
#4  0x0000000000afc42e in dict_build_table_def_step (thr=0x2aaaaaff21f0, node=0x2aaaaaff0cb8) at dict/dict0crea.c:225
        table = (dict_table_t *) 0x2aaaaaff08b8
        row = (dtuple_t *) 0x2aaaaaff0c48
        error = 46912501653576
        path_or_name = 0x550 <Address 0x550 out of bounds>
        is_path = 608
        mtr = {state = 1158174288, memo = {heap = 0xae2ba5, used = 1158174336, data = {16 '\020', 185 '¹', 252 'ü', 170 'ª', 170 'ª', 
<CUT>
[29 May 2008 11:13] Alexander Barkov
The problem happens because of dict_col_struct definition in dic0mem.h :

struct dict_col_struct{
...
        unsigned        mbminlen:2;     /* minimum length of a
                                        character, in bytes */
        unsigned        mbmaxlen:3;     /* maximum length of a
                                        character, in bytes */
....
}

mbminlen is limited to 2 bits and cannot fit the value of "4".

Note, the total number of bits used is 64.
It's not clear if it's safe to extend mbminlen to use 3 bits,
or is the structure size intended to stay in four-byte limit.

Need a suggestion from the InnoDB team.
[7 Jun 2008 8:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47567

2655 Alexander Barkov	2008-06-07
      Bug#34391 Character sets: crash if char(), utf32, innodb
      Problem: mbminlen in dict_col_t and dtype_t uses 2 bits,
      which cannot store the value of "4" required for UTF32.
      Fix: adding macros to pack/unpack the range 1..4 to 0..3.
[29 Aug 2008 5:31] Lars Thalmann
Reassigning to InnoDB.

With adding two-byte collation IDs, Bar's old fix for
this bug became incomplete. 

If the old fix tried to stay under terms of the old 
structure size (64 bits) for InnoDB structures 
dict_col_t and dtype_t, the new patch supporting 
two-byte collation IDs will have to extend the 
size of these structures anyway.

That means Bar's old hack is not required anymore.

An easier way is just to add more bits to "mbmaxlen" 
and "mbminlen" members of the above structures.
[3 Sep 2008 18:32] Marko Mäkelä
This is not just a matter of changing some in-memory data structures.

InnoDB reserves only 8 bits for the characterset-collation identifier in various disk-based data structures. Reserving more space for the ID would require major incompatible changes to the data and log file formats.

If the collation IDs really have to be 16-bit, you could prohibit the use of collation IDs above 255 on storage engines that only support 8-bit collation IDs.
[28 Oct 2008 8:42] Marko Mäkelä
Allow mbminlen, mbmaxlen to be 0..4 bytes.

Attachment: mysql-6.0-bug34391.patch (text/x-diff), 16.96 KiB.

[28 Oct 2008 8:48] Marko Mäkelä
We do not have an up-to-date 6.0 repository yet. The attached patch is against http://bazaar.launchpad.net/%7Emysql/mysql-server/mysql-6.0/ as of this change:

revno: 2862
committer: Alexander Nozdrin <alik@mysql.com>
branch nick: 6.0.build
timestamp: Wed 2008-10-08 11:51:22 +0400
message:
  Disable main.events_bugs.test due to frequent failures.

The patch does *not* fix two known issues:

* charset-collation IDs longer than 8 bits
* space padding of UTF32. This should be done as in UCS2, or certain things, such as column prefix indexes, will malfunction.
[7 Nov 2008 13:15] Marko Mäkelä
Can MySQL ever use the pre-5.0 ("non-true") VARCHAR with the UTF-16 or UTF-32 encoding? ha_innobase::store_key_val_for_row() is padding the non-true VARCHAR with 0x20 bytes, which would obviously be incorrect for UTF-16 and UTF-32. In UTF-16 without surrogate pairs (MySQL's "ucs2"), this should not be a problem, because it is a fixed-width character set, and thus key_len==true_len should hold.
[11 Nov 2008 20:53] Marko Mäkelä
Does InnoDB need to support values of mbmaxlen > 4? For Unicode, it does not seem so. What about other character sets?

Regarding Unicode, there are a few #ifdef UNICODE_32BIT in the utf8mb3 functions in strings/ctype-utf8.c that seem to enable nonstandard 1..6-byte UTF-8, but I don't think anyone would define that. IETF RFC 3629 from 2003 defines the UTF-8 encoding as 1..4 bytes per character.
[21 Nov 2008 12:51] Alexander Barkov
Hi Marko,

We don't need to support more than 4 bytes for Unicode.

I think we won't support character sets with characters longer than 4 bytes.
At least in the nearest few years :)
[21 Nov 2008 13:05] Alexander Barkov
Regarding 4-byte character set and padding with 0x20:

This line should be fixed:

  memset(buff, ' ', pad_len);

to:

  cs->cset->fill(cs, buff, pad_len, fill);
[18 Dec 2008 11:44] Marko Mäkelä
Regarding Bar's comment on [21 Nov 14:05], some of the padding in InnoDB turns out to be unnecessary. In row_sel_store_mysql_rec(), we shouldn't pad (or space-fill) columns that are NULL, but copy from table->s->default_values instead. See Bug #39648. This is just a side remark; we will have to fix the remaining cases of padding in InnoDB.
[8 Jan 2009 12:51] Marko Mäkelä
This was committed to the InnoDB source repository and will be included in the next snapshot.
[28 Jun 2010 11:01] Marko Mäkelä
Resurfaced as Bug #52199
[27 Jul 2010 23:38] Calvin Sun
mark as duplicate of bug#52199.