Bug #50934 hyphen is not mapped
Submitted: 5 Feb 2010 1:39 Modified: 5 Feb 2010 15:30
Reporter: Mikiya Okuno Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.1 OS:Any
Assigned to: CPU Architecture:Any

[5 Feb 2010 1:39] Mikiya Okuno
Description:
Hyphen on utf8 (0xE28892) is not mapped to eucjpms, ujis, sjis and cp932.

How to repeat:
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select 0xe28892;
+----------+
| 0xe28892 |
+----------+
| −        |
+----------+
1 row in set (0.00 sec)

mysql> select convert(cast(0xe28892 as char) using eucjpms);
+-----------------------------------------------+
| convert(cast(0xe28892 as char) using eucjpms) |
+-----------------------------------------------+
| ?                                             |
+-----------------------------------------------+
1 row in set (0.00 sec)

mysql> select convert(cast(0xe28892 as char) using cp932);
+---------------------------------------------+
| convert(cast(0xe28892 as char) using cp932) |
+---------------------------------------------+
| ?                                           |
+---------------------------------------------+
1 row in set (0.00 sec)

Suggested fix:
Should be mapped to A1DD on EUC-jp, and 817C on Shift_JIS respectively.
[5 Feb 2010 15:30] Peter Gulutzan
UTF8 E28892 is the code for UCS2 2212 MINUS SIGN.
It is one of several characters which have a "fullwidth"
equivalent (others are WAVE DASH, DOUBLE VERTICAL LINE,
MINUS SIGN, CENT SIGN, POUND SIGN, NOT SIGN, etc.).
According to one set of Japanese rules, we should
convert both characters to the same thing.
But according to another set of Japanese rules, we
should not.
It's not because we hae a bug, it's because there are
incompatible sets of Japanese rules for these characters.
So it would be a mistake to 'fix' this by changing one
rule. That would be inconsistent and would 'break' elsewhere.

Our proposal is to to allow both sets of Japanese conversion
rules for all the characters which have fullwidth equivalents.
The details of this proposal are in
WL#1820 Variant SJIS and UJIS Japanese Character Sets.