Bug #15376 Unassigned multibyte codes are converted to U+0000
Submitted: 1 Dec 2005 7:56 Modified: 12 Dec 2005 7:29
Reporter: Alexander Barkov Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1 OS:
Assigned to: Assigned Account CPU Architecture:Any

[1 Dec 2005 7:56] Alexander Barkov
Description:
0x8FABF8 is a valid UJIS multibyte sequence (3 bytes) in this format:

[x8F][xA1-xFE][xA1-xFE]

corresponding to JIS-X-0212 code 0x2B78

(i.e. remove the 0x8F introducer, then substruct 0x8080 from 0xABF8).

When converting this character to UCS2, the result is 0x0000,
which is wrong.

It is true that this character doesn't have Unicode mapping,
however the expected result is to return 0x003F QUESTION MARK,
like impossible conversion usually does.

How to repeat:
sql> select hex(convert(_ujis 0x8FABF8 using ucs2));
+-----------------------------------------+
| hex(convert(_ujis 0x8FABF8 using ucs2)) |
+-----------------------------------------+
| 0000                                    |
+-----------------------------------------+
1 row in set (0.00 sec)

Suggested fix:
Fix to retun 0x003F (i.e. question mark)
[12 Dec 2005 7:29] Alexander Barkov
This bug has the same sources of the problem as the #15375 does.
Closing a duplicate.
[23 Mar 2006 10:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/4056