Bug #32914 | Character sets: illegal characters in utf8 and utf32 columns | ||
---|---|---|---|
Submitted: | 2 Dec 2007 22:05 | Modified: | 29 Jul 2008 17:31 |
Reporter: | Peter Gulutzan | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 6.0.4-alpha-debug | OS: | Linux (SUSE 10 64-bit) |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
[2 Dec 2007 22:05]
Peter Gulutzan
[2 Dec 2007 22:46]
MySQL Verification Team
Thank you for the bug report. Verified as described.
[6 Dec 2007 9:35]
Alexander Barkov
Peter, you say: > I can insert characters >= code point value 10ffff > in utf8 and utf32 encodings. But then you insert U+10FFFF, which *IS* a valid character: > > How to repeat: > mysql> create table t (utf32 char(1) character set utf32, utf8 char(1) character > set > utf8); > Query OK, 0 rows affected (0.01 sec) Please clarify what the problem is.
[7 Dec 2007 18:42]
Peter Gulutzan
I'm sorry, the 'how to repeat' indeed showed insertion of maximum legal value, rather than insertion of minimum illegal value. Here is a new 'how to repeat': mysql> create table t (utf32 char(1) character set utf32, utf8 char(1) character set utf8); Query OK, 0 rows affected (0.01 sec) mysql> insert into t values (0x110000,0xf4908080); Query OK, 1 row affected (0.00 sec) mysql> select hex(utf32),hex(utf8) from t; +------------+-----------+ | hex(utf32) | hex(utf8) | +------------+-----------+ | 00110000 | F4908080 | +------------+-----------+ 1 row in set (0.00 sec)
[1 Apr 2008 15:10]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/44738 ChangeSet@1.2614, 2008-04-01 20:03:44+05:00, bar@mysql.com +6 -0 Bug#32914 Character sets: illegal characters in utf8 and utf32 columns Problem: inserting of Unicode values higher than U+10FFFF was possible into utf32 and utf8 columns Fix: - my_mb_wc_utf8mb4() was not strict enough. Adding more strict rules. - well_formed_copy_nchars() didn't check if left ZERO PADDING generated a wrong character. Adding extra checking for the leftmost (padded) character.
[4 Apr 2008 5:51]
Alexander Barkov
Pushed into 6.0.5-engines
[29 Jul 2008 3:09]
Alexander Barkov
Appeared in bzr mysql-6.0.7-aplha.
[29 Jul 2008 17:31]
Paul DuBois
Noted in 6.0.7 changelog. It was possible to insert invalid Unicode characters (with code point values greater than U+10FFFF) into utf8 and utf32 columns.
[13 Sep 2008 22:41]
Bugs System
Pushed into 6.0.6-alpha (revid:bar@mysql.com-20080715105907-h7yaof18afggvs7a) (version source revid:hakan@mysql.com-20080716105246-eg0utbybp122n2w9) (pib:3)