Bug #34391 | Character sets: crash if char(), utf32, innodb | ||
---|---|---|---|
Submitted: | 7 Feb 2008 19:27 | Modified: | 27 Jul 2010 23:38 |
Reporter: | Peter Gulutzan | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 6.0.5-alpha-debug | OS: | Linux (SUSE 10 64-bit) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[7 Feb 2008 19:27]
Peter Gulutzan
[7 Feb 2008 20:26]
MySQL Verification Team
Thank you for the bug report. Program received signal SIGABRT, Aborted. [Switching to Thread 1158191440 (LWP 19785)] 0x0000003bfca30ec5 in raise () from /lib64/libc.so.6 (gdb) bt full #0 0x0000003bfca30ec5 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x0000003bfca32970 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x0000000000a801a3 in dtype_get_min_size_low (mtype=13, prtype=3932414, len=4, mbminlen=0, mbmaxlen=4) at ../../storage/innobase/include/data0type.ic:499 No locals. #3 0x0000000000a80a73 in dict_col_get_min_size (col=0x2aaaaaff10c0) at ../../storage/innobase/include/dict0dict.ic:66 No locals. #4 0x0000000000afc42e in dict_build_table_def_step (thr=0x2aaaaaff21f0, node=0x2aaaaaff0cb8) at dict/dict0crea.c:225 table = (dict_table_t *) 0x2aaaaaff08b8 row = (dtuple_t *) 0x2aaaaaff0c48 error = 46912501653576 path_or_name = 0x550 <Address 0x550 out of bounds> is_path = 608 mtr = {state = 1158174288, memo = {heap = 0xae2ba5, used = 1158174336, data = {16 '\020', 185 '¹', 252 'ü', 170 'ª', 170 'ª', <CUT>
[29 May 2008 11:13]
Alexander Barkov
The problem happens because of dict_col_struct definition in dic0mem.h : struct dict_col_struct{ ... unsigned mbminlen:2; /* minimum length of a character, in bytes */ unsigned mbmaxlen:3; /* maximum length of a character, in bytes */ .... } mbminlen is limited to 2 bits and cannot fit the value of "4". Note, the total number of bits used is 64. It's not clear if it's safe to extend mbminlen to use 3 bits, or is the structure size intended to stay in four-byte limit. Need a suggestion from the InnoDB team.
[7 Jun 2008 8:37]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47567 2655 Alexander Barkov 2008-06-07 Bug#34391 Character sets: crash if char(), utf32, innodb Problem: mbminlen in dict_col_t and dtype_t uses 2 bits, which cannot store the value of "4" required for UTF32. Fix: adding macros to pack/unpack the range 1..4 to 0..3.
[29 Aug 2008 5:31]
Lars Thalmann
Reassigning to InnoDB. With adding two-byte collation IDs, Bar's old fix for this bug became incomplete. If the old fix tried to stay under terms of the old structure size (64 bits) for InnoDB structures dict_col_t and dtype_t, the new patch supporting two-byte collation IDs will have to extend the size of these structures anyway. That means Bar's old hack is not required anymore. An easier way is just to add more bits to "mbmaxlen" and "mbminlen" members of the above structures.
[3 Sep 2008 18:32]
Marko Mäkelä
This is not just a matter of changing some in-memory data structures. InnoDB reserves only 8 bits for the characterset-collation identifier in various disk-based data structures. Reserving more space for the ID would require major incompatible changes to the data and log file formats. If the collation IDs really have to be 16-bit, you could prohibit the use of collation IDs above 255 on storage engines that only support 8-bit collation IDs.
[28 Oct 2008 8:42]
Marko Mäkelä
Allow mbminlen, mbmaxlen to be 0..4 bytes.
Attachment: mysql-6.0-bug34391.patch (text/x-diff), 16.96 KiB.
[28 Oct 2008 8:48]
Marko Mäkelä
We do not have an up-to-date 6.0 repository yet. The attached patch is against http://bazaar.launchpad.net/%7Emysql/mysql-server/mysql-6.0/ as of this change: revno: 2862 committer: Alexander Nozdrin <alik@mysql.com> branch nick: 6.0.build timestamp: Wed 2008-10-08 11:51:22 +0400 message: Disable main.events_bugs.test due to frequent failures. The patch does *not* fix two known issues: * charset-collation IDs longer than 8 bits * space padding of UTF32. This should be done as in UCS2, or certain things, such as column prefix indexes, will malfunction.
[7 Nov 2008 13:15]
Marko Mäkelä
Can MySQL ever use the pre-5.0 ("non-true") VARCHAR with the UTF-16 or UTF-32 encoding? ha_innobase::store_key_val_for_row() is padding the non-true VARCHAR with 0x20 bytes, which would obviously be incorrect for UTF-16 and UTF-32. In UTF-16 without surrogate pairs (MySQL's "ucs2"), this should not be a problem, because it is a fixed-width character set, and thus key_len==true_len should hold.
[11 Nov 2008 20:53]
Marko Mäkelä
Does InnoDB need to support values of mbmaxlen > 4? For Unicode, it does not seem so. What about other character sets? Regarding Unicode, there are a few #ifdef UNICODE_32BIT in the utf8mb3 functions in strings/ctype-utf8.c that seem to enable nonstandard 1..6-byte UTF-8, but I don't think anyone would define that. IETF RFC 3629 from 2003 defines the UTF-8 encoding as 1..4 bytes per character.
[21 Nov 2008 12:51]
Alexander Barkov
Hi Marko, We don't need to support more than 4 bytes for Unicode. I think we won't support character sets with characters longer than 4 bytes. At least in the nearest few years :)
[21 Nov 2008 13:05]
Alexander Barkov
Regarding 4-byte character set and padding with 0x20: This line should be fixed: memset(buff, ' ', pad_len); to: cs->cset->fill(cs, buff, pad_len, fill);
[18 Dec 2008 11:44]
Marko Mäkelä
Regarding Bar's comment on [21 Nov 14:05], some of the padding in InnoDB turns out to be unnecessary. In row_sel_store_mysql_rec(), we shouldn't pad (or space-fill) columns that are NULL, but copy from table->s->default_values instead. See Bug #39648. This is just a side remark; we will have to fix the remaining cases of padding in InnoDB.
[8 Jan 2009 12:51]
Marko Mäkelä
This was committed to the InnoDB source repository and will be included in the next snapshot.
[28 Jun 2010 11:01]
Marko Mäkelä
Resurfaced as Bug #52199
[27 Jul 2010 23:38]
Calvin Sun
mark as duplicate of bug#52199.