Bug #36415 server treats two different UTF8 glyphs as the same
Submitted: 29 Apr 2008 22:46 Modified: 2 May 2008 15:20
Reporter: Ronald Beimel Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0.45 Source distribution, 5.0, 5.1, 6.0 BK OS:Linux
Assigned to: CPU Architecture:Any
Tags: Kanji, SELECT, Unicode

[29 Apr 2008 22:46] Ronald Beimel
Description:
Server appears to not be able to tell the difference between a particular pair of two UTF8 characters: 角 and 駒

I am using MySQL from the command line.

When I run the code in the "How to repeat" section of this form, the select statements pull both rows of the table, even though they should only pull one.  I have not seen this happen with other pairs of UTF8 glyphs.

I found the same problem with version 4.1.20 and 5.0.45-Debian_1ubuntu3.3-log Debian etch.

How to repeat:
CREATE TABLE test_kanji (id int(11) PRIMARY KEY,kanji varchar(20)) ENGINE=MyISAM DEFAULT CHARSET=UTF8;

INSERT INTO test_kanji VALUES ('1','角'),('2','駒');

SELECT * FROM test_kanji WHERE kanji='角';

SELECT * FROM test_kanji WHERE kanji='駒';
[29 Apr 2008 22:56] Ronald Beimel
Clarification:
When I run the code in the "How to repeat" section of this form, EACH of the select statements pull both rows of the table, even though they should only pull one each.
[30 Apr 2008 2:17] Ronald Beimel
Recategorized the bug.
[1 May 2008 11:07] Sveta Smirnova
Thank you for the report.

Verified as described.
[2 May 2008 1:25] Omer Barnir
Partial workaround is to use a collation on utf8_bin
[2 May 2008 7:27] Alexander Barkov
Hi, 

I can't reproduce the same problem. These two glyphs are
not treated as the same:

mysql> set names utf8;
Query OK, 0 rows affected (0.04 sec)

mysql> CREATE TABLE test_kanji (id int(11) PRIMARY KEY,kanji varchar(20)) ENGINE=MyISAM DEFAULT
    -> CHARSET=UTF8;
Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO test_kanji VALUES ('1','角'),('2','駒');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql>
mysql> SELECT * FROM test_kanji WHERE kanji='角';
+----+-------+
| id | kanji |
+----+-------+
|  1 | 角   |
+----+-------+
1 row in set (0.02 sec)

mysql>
mysql> SELECT * FROM test_kanji WHERE kanji='駒';
+----+-------+
| id | kanji |
+----+-------+
|  2 | 駒   |
+----+-------+
1 row in set (0.00 sec)

Please make sure that character set of your console program
corresponds to these three MySQL session parameters:

mysql> show variables like 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
...
| character_set_results    | utf8                       |
...
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

Most likely they are latin1 for you.
[2 May 2008 15:20] Sveta Smirnova
Alexander, you are right: I missed SET NAMES 'utf8' from the test file. So report is cloased as "Not a Bug".