Bug #15375 | Unassigned multibyte codes are broken into parts when converting to Unicode | ||
---|---|---|---|
Submitted: | 1 Dec 2005 7:36 | Modified: | 11 Apr 2006 13:19 |
Reporter: | Alexander Barkov | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | OS: | ||
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
[1 Dec 2005 7:36]
Alexander Barkov
[1 Dec 2005 7:38]
Alexander Barkov
An example of the same problem with GBK: mysql> select hex(convert(_gbk 0xA140 using ucs2)); +--------------------------------------+ | hex(convert(_gbk 0xA140 using ucs2)) | +--------------------------------------+ | 003F0040 | +--------------------------------------+ 1 row in set (0.00 sec)
[12 Dec 2005 7:28]
Alexander Barkov
See also #15376: 0x8FABF8 is a valid UJIS multibyte sequence (3 bytes) in this format: [x8F][xA1-xFE][xA1-xFE] corresponding to JIS-X-0212 code 0x2B78 (i.e. remove the 0x8F introducer, then substruct 0x8080 from 0xABF8). When converting this character to UCS2, the result is 0x0000, which is wrong. It is true that this character doesn't have Unicode mapping, however the expected result is to return 0x003F QUESTION MARK, like impossible conversion usually does.
[12 Dec 2005 17:48]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76
[23 Mar 2006 9:12]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/4055
[23 Mar 2006 16:11]
Alexander Barkov
Fixed in 4.1.19, 5.0.20, 5.1.8.
[11 Apr 2006 13:19]
Paul DuBois
Noted in 4.1.19, 5.0.20, 5.1.8 changelogs. During conversion from one character set to <literal>ucs2</literal>, multi-byte characters with no <literal>ucs2</literal> equivalent were converted to multiple characters, rather than to <literal>0x003F QUESTION MARK</literal>. (Bug #15375)