Bug #58321 | No warning when characters outside BMP0 is converted to UCS2 | ||
---|---|---|---|
Submitted: | 19 Nov 2010 11:56 | Modified: | 22 Dec 2010 19:27 |
Reporter: | Bernt Marius Johnsen | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 5.6, 5.5 | OS: | Any |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
[19 Nov 2010 11:56]
Bernt Marius Johnsen
[19 Nov 2010 12:30]
Susanne Ebrecht
Many thanks for writing a bug report. You missed to use SET NAMES here to set client encoding to your environment encoding.
[19 Nov 2010 12:40]
Bernt Marius Johnsen
SET NAMES is of no relevance to my example, but since you insist: mysql> set names utf8; Query OK, 0 rows affected (0.00 sec) mysql> create table t1 (v varchar(10) character set utf32); Query OK, 0 rows affected (0.06 sec) mysql> insert into t1 values (0x00012345); Query OK, 1 row affected (0.00 sec) mysql> create table t2 (v varchar(10) character set utf8); Query OK, 0 rows affected (0.07 sec) mysql> create table t3 (v varchar(10) character set ucs2); Query OK, 0 rows affected (0.07 sec) mysql> insert into t2 (select * from t1); Query OK, 1 row affected, 1 warning (0.00 sec) Records: 1 Duplicates: 0 Warnings: 1 mysql> show warnings; +---------+------+--------------------------------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------------------------------+ | Warning | 1366 | Incorrect string value: '\x00\x01\x23\x45' for column 'v' at row 1 | +---------+------+--------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> insert into t3 (select * from t1); Query OK, 1 row affected (0.00 sec) Records: 1 Duplicates: 0 Warnings: 0
[19 Nov 2010 12:53]
Susanne Ebrecht
0x00012345 is not a valid UTF8 sign. Take here the valid utf8 sign.
[19 Nov 2010 12:57]
Bernt Marius Johnsen
Of course 0x00012345 is not a valid UTF8 "sign", it's a valid Unicode character outside BMP0 and thus stored in a table with character set UTF32. That's the whole point! When the conversion to UTF8 is attempted, I get a warning (which is correct). When conversion to UCS2 is attempted, I DO NOT get a warning. And that's the bug.
[19 Nov 2010 13:48]
Susanne Ebrecht
You also not get warning in utf16. http://www.unicode.org/faq/basic_q.html#14 http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-ucs2.html
[19 Nov 2010 15:12]
Bernt Marius Johnsen
Of course you dont get a warning for utf16, since 0x00012345 is supported by utf16. The problem is: YOU DO NOT GET A WARNING WHEN IT IS TRUNCATED TO UCS2 WHICH DO NOT SUPPORT CHARACTERS LARGER THAN 0x0000FFFF!!!!
[19 Nov 2010 19:25]
Sveta Smirnova
Thank you for the report. Verified as described although 0X00012345 can be successfully inserted too: set names utf8; create table t1 (v varchar(10) character set utf32); insert into t1 values (0x00012345); create table t2 (v varchar(10) character set utf8); create table t3 (v varchar(10) character set ucs2); insert into t2 (select * from t1); Warnings: Warning 1366 Incorrect string value: '\x00\x01\x23\x45' for column 'v' at row 1 show warnings; Level Code Message Warning 1366 Incorrect string value: '\x00\x01\x23\x45' for column 'v' at row 1 insert into t3 (select * from t1); show warnings; Level Code Message insert into t3 values (0x00012345); select hex(v) from t1; hex(v) 00012345 select hex(v) from t2; hex(v) 3F select hex(v) from t3; hex(v) 2345 00012345
[22 Nov 2010 12:22]
Bernt Marius Johnsen
Well, according to the docs ch 9.1.10: ucs2 and utf8 support BMP characters. utf8mb4, utf16, and utf32 support BMP and supplementary characters. So 0x0001234 should not be possible in UCS2, so here something is missing (truncation/error/warning ....)
[22 Nov 2010 12:38]
Bernt Marius Johnsen
Ah, the last one is correct since 0x00012345 in ucs2 context is interpreted as a string if *TWO* UCS2 characters: 0x0001 and 0x1234.
[23 Nov 2010 18:42]
Peter Gulutzan
This bug occurs in 5.5 as well as 5.6, and occurs for conversions from utf16 as well as utf32, and is a bit worse than 'lack of warning': the 12345 becomes 2345.
[15 Dec 2010 10:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/126898 3195 Alexander Barkov 2010-12-15 Bug#58321 No warning when characters outside BMP0 is converted to UCS2 Problem: when inserting supplementary characters to an UCS2 column, character was silently shrinked to 16-bit value. Fix: produce a warning on attempt to insert a supplementary character, and convert to question mark. @ mysql-test/r/ctype_many.result @ mysql-test/t/ctype_many.test Adding tests @ strings/ctype-ucs2.c Check if wc is greater than the highest value supported (0xFFFF), return MY_CS_ILUNI if true.
[15 Dec 2010 10:41]
Alexander Barkov
Pushed into mysql-5.5-bugteam (5.5.8) Pushed into mysql-trunk-bugfixing (5.6.1-m5)
[17 Dec 2010 12:51]
Bugs System
Pushed into mysql-5.5 5.5.9 (revid:georgi.kodinov@oracle.com-20101217124733-p1ivu6higouawv8l) (version source revid:bar@mysql.com-20101215095837-lkrt9d2srudod5b5) (merge vers: 5.5.8) (pib:24)
[17 Dec 2010 12:55]
Bugs System
Pushed into mysql-trunk 5.6.1 (revid:georgi.kodinov@oracle.com-20101217125013-y8pb3az32rtbplc9) (version source revid:bar@mysql.com-20101215103503-jge0qg33poet3ffh) (merge vers: 5.6.1) (pib:24)
[22 Dec 2010 19:27]
Paul DuBois
Noted in 5.5.9 changelog. The ucs2 character set does not support characters outside the Basic Multilingual Plane (BMP), but converting a string containing such characters did not produce a conversion-failure warning.