Bug #10529 | Irreversible conversion of 0x5C between sjis and other char sets | ||
---|---|---|---|
Submitted: | 11 May 2005 7:04 | Modified: | 16 May 2005 18:14 |
Reporter: | Hartmut Holzgraefe | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 4.1.11 | OS: | Any (any) |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
[11 May 2005 7:04]
Hartmut Holzgraefe
[16 May 2005 18:14]
Alexander Barkov
This is about round trip conversion issue we discussed with Shuichi, and expected behavior for sjis in MySQL. If you want to convert 0x005C of Unicode back to 0x5C, you should use cp932.
[20 May 2005 5:30]
Shuichi Tamagawa
Hi Hartmut When 'sjis' characterset is used, MySQL convert the characters based on the rule defined by Unicode Consortium. For ASCII characters to which 0x5C character belong to: http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT For JIS0208 characters to which 0x815F character belong to: http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0208.TXT The problem is that both 0x5C and 0x815F is mapped to 0x005C of Unicode. So, when 0x005C is converte back to sjis, MySQL has to chose one of these characters. But there is no 'standard rule' for the conversion from Unicode to sjis. The rule of sjis character set of MySQL choses 0x815F. This is the expected behavior. MySQL could chose 0x5C, but in that case, 0x815C would be irreversible. On the other hand, cp932 has slightly different Unicode conversion rule from sjis in additoin to extended character support. In cp932 characterset, the characters are converted to Unicode based on the rule defined by Microsoft http://www.microsoft.com/globaldev/reference/dbcs/932.mspx . In this rule 0x5C is mapped to 0x005C and 0x815F is mapped to FF3C. So for both of the characters, unicode conversion is reversible. Hope this helps your understanding.