Bug #36415 | server treats two different UTF8 glyphs as the same | ||
---|---|---|---|
Submitted: | 29 Apr 2008 22:46 | Modified: | 2 May 2008 15:20 |
Reporter: | Ronald Beimel | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 5.0.45 Source distribution, 5.0, 5.1, 6.0 BK | OS: | Linux |
Assigned to: | CPU Architecture: | Any | |
Tags: | Kanji, SELECT, Unicode |
[29 Apr 2008 22:46]
Ronald Beimel
[29 Apr 2008 22:56]
Ronald Beimel
Clarification: When I run the code in the "How to repeat" section of this form, EACH of the select statements pull both rows of the table, even though they should only pull one each.
[30 Apr 2008 2:17]
Ronald Beimel
Recategorized the bug.
[1 May 2008 11:07]
Sveta Smirnova
Thank you for the report. Verified as described.
[2 May 2008 1:25]
Omer Barnir
Partial workaround is to use a collation on utf8_bin
[2 May 2008 7:27]
Alexander Barkov
Hi, I can't reproduce the same problem. These two glyphs are not treated as the same: mysql> set names utf8; Query OK, 0 rows affected (0.04 sec) mysql> CREATE TABLE test_kanji (id int(11) PRIMARY KEY,kanji varchar(20)) ENGINE=MyISAM DEFAULT -> CHARSET=UTF8; Query OK, 0 rows affected (0.06 sec) mysql> INSERT INTO test_kanji VALUES ('1','角'),('2','駒'); Query OK, 2 rows affected (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql> mysql> SELECT * FROM test_kanji WHERE kanji='角'; +----+-------+ | id | kanji | +----+-------+ | 1 | 角 | +----+-------+ 1 row in set (0.02 sec) mysql> mysql> SELECT * FROM test_kanji WHERE kanji='駒'; +----+-------+ | id | kanji | +----+-------+ | 2 | 駒 | +----+-------+ 1 row in set (0.00 sec) Please make sure that character set of your console program corresponds to these three MySQL session parameters: mysql> show variables like 'character_set%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | ... | character_set_results | utf8 | ... +--------------------------+----------------------------+ 8 rows in set (0.00 sec) Most likely they are latin1 for you.
[2 May 2008 15:20]
Sveta Smirnova
Alexander, you are right: I missed SET NAMES 'utf8' from the test file. So report is cloased as "Not a Bug".