MySQL Bugs: #34391: Character sets: crash if char(), utf32, innodb

Bug #34391	Character sets: crash if char(), utf32, innodb
Submitted:	7 Feb 2008 19:27	Modified:	27 Jul 2010 23:38
Reporter:	Peter Gulutzan	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	6.0.5-alpha-debug	OS:	Linux (SUSE 10 64-bit)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
I try to create a table.
The engine is InnoDB (no problem with other engines).
The data type is CHAR (no problem with VARCHAR).
The character set is UTF32 (no problem with UTF8).
Crash.

How to repeat:
mysql> create table t (s1 char(1) character set utf32) engine=innodb;
ERROR 2013 (HY000): Lost connection to MySQL server during query

Thank you for the bug report.

Program received signal SIGABRT, Aborted.
[Switching to Thread 1158191440 (LWP 19785)]
0x0000003bfca30ec5 in raise () from /lib64/libc.so.6
(gdb) bt full
#0  0x0000003bfca30ec5 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003bfca32970 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x0000000000a801a3 in dtype_get_min_size_low (mtype=13, prtype=3932414, len=4, mbminlen=0, mbmaxlen=4)
    at ../../storage/innobase/include/data0type.ic:499
No locals.
#3  0x0000000000a80a73 in dict_col_get_min_size (col=0x2aaaaaff10c0) at ../../storage/innobase/include/dict0dict.ic:66
No locals.
#4  0x0000000000afc42e in dict_build_table_def_step (thr=0x2aaaaaff21f0, node=0x2aaaaaff0cb8) at dict/dict0crea.c:225
        table = (dict_table_t *) 0x2aaaaaff08b8
        row = (dtuple_t *) 0x2aaaaaff0c48
        error = 46912501653576
        path_or_name = 0x550 <Address 0x550 out of bounds>
        is_path = 608
        mtr = {state = 1158174288, memo = {heap = 0xae2ba5, used = 1158174336, data = {16 '\020', 185 '¹', 252 'ü', 170 'ª', 170 'ª', 
<CUT>

The problem happens because of dict_col_struct definition in dic0mem.h :

struct dict_col_struct{
...
        unsigned        mbminlen:2;     /* minimum length of a
                                        character, in bytes */
        unsigned        mbmaxlen:3;     /* maximum length of a
                                        character, in bytes */
....
}

mbminlen is limited to 2 bits and cannot fit the value of "4".

Note, the total number of bits used is 64.
It's not clear if it's safe to extend mbminlen to use 3 bits,
or is the structure size intended to stay in four-byte limit.

Need a suggestion from the InnoDB team.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47567

2655 Alexander Barkov	2008-06-07
      Bug#34391 Character sets: crash if char(), utf32, innodb
      Problem: mbminlen in dict_col_t and dtype_t uses 2 bits,
      which cannot store the value of "4" required for UTF32.
      Fix: adding macros to pack/unpack the range 1..4 to 0..3.

Reassigning to InnoDB.

With adding two-byte collation IDs, Bar's old fix for
this bug became incomplete. 

If the old fix tried to stay under terms of the old 
structure size (64 bits) for InnoDB structures 
dict_col_t and dtype_t, the new patch supporting 
two-byte collation IDs will have to extend the 
size of these structures anyway.

That means Bar's old hack is not required anymore.

An easier way is just to add more bits to "mbmaxlen" 
and "mbminlen" members of the above structures.

This is not just a matter of changing some in-memory data structures.

InnoDB reserves only 8 bits for the characterset-collation identifier in various disk-based data structures. Reserving more space for the ID would require major incompatible changes to the data and log file formats.

If the collation IDs really have to be 16-bit, you could prohibit the use of collation IDs above 255 on storage engines that only support 8-bit collation IDs.

Allow mbminlen, mbmaxlen to be 0..4 bytes.

Attachment: mysql-6.0-bug34391.patch (text/x-diff), 16.96 KiB.

We do not have an up-to-date 6.0 repository yet. The attached patch is against http://bazaar.launchpad.net/%7Emysql/mysql-server/mysql-6.0/ as of this change:

revno: 2862
committer: Alexander Nozdrin <alik@mysql.com>
branch nick: 6.0.build
timestamp: Wed 2008-10-08 11:51:22 +0400
message:
  Disable main.events_bugs.test due to frequent failures.

The patch does *not* fix two known issues:

* charset-collation IDs longer than 8 bits
* space padding of UTF32. This should be done as in UCS2, or certain things, such as column prefix indexes, will malfunction.

Can MySQL ever use the pre-5.0 ("non-true") VARCHAR with the UTF-16 or UTF-32 encoding? ha_innobase::store_key_val_for_row() is padding the non-true VARCHAR with 0x20 bytes, which would obviously be incorrect for UTF-16 and UTF-32. In UTF-16 without surrogate pairs (MySQL's "ucs2"), this should not be a problem, because it is a fixed-width character set, and thus key_len==true_len should hold.

Does InnoDB need to support values of mbmaxlen > 4? For Unicode, it does not seem so. What about other character sets?

Regarding Unicode, there are a few #ifdef UNICODE_32BIT in the utf8mb3 functions in strings/ctype-utf8.c that seem to enable nonstandard 1..6-byte UTF-8, but I don't think anyone would define that. IETF RFC 3629 from 2003 defines the UTF-8 encoding as 1..4 bytes per character.

Hi Marko,

We don't need to support more than 4 bytes for Unicode.

I think we won't support character sets with characters longer than 4 bytes.
At least in the nearest few years :)

Regarding 4-byte character set and padding with 0x20:

This line should be fixed:

  memset(buff, ' ', pad_len);

to:

  cs->cset->fill(cs, buff, pad_len, fill);

Regarding Bar's comment on [21 Nov 14:05], some of the padding in InnoDB turns out to be unnecessary. In row_sel_store_mysql_rec(), we shouldn't pad (or space-fill) columns that are NULL, but copy from table->s->default_values instead. See Bug #39648. This is just a side remark; we will have to fix the remaining cases of padding in InnoDB.

This was committed to the InnoDB source repository and will be included in the next snapshot.

Resurfaced as Bug #52199

mark as duplicate of bug#52199.