Bug #66939 g-circumflex and s-circumflex considered the same character in utf8_unicode_ci
Submitted: 24 Sep 2012 1:18 Modified: 24 Sep 2012 13:09
Reporter: Edoardo Nannotti Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.1.63 OS:Linux
Assigned to: CPU Architecture:Any

[24 Sep 2012 1:18] Edoardo Nannotti
Description:
Two utf8 characters, "ĝ" (g-circumflex) and "ŝ" (s-circumflex), are considered to be the same when the collation is set to utf8_unicode_ci.
To fix it one must use utf8_bin.

How to repeat:
mysql> CREATE DATABASE mydb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
Query OK, 1 row affected (0.00 sec)

mysql> USE mydb;
Database changed

mysql> CREATE TABLE t (c CHAR(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci);
Query OK, 0 rows affected (0.11 sec)

mysql> INSERT INTO t VALUES ('ĝuo');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t VALUES ('ŝuo');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM t WHERE c='ŝuo';
+------+
| c    |
+------+
| ĝuo |
| ŝuo |
+------+
2 rows in set (0.00 sec)
[24 Sep 2012 7:35] Peter Laursen
Does not happen for me on neither 5.1.63 nor 5.5.23:

SET NAMES utf8 COLLATE utf8_unicode_ci;
SELECT 'ĝ' = 'ŝ'; -- returns "0"
SET NAMES utf8 COLLATE utf8_general_ci;
SELECT 'ĝ' = 'ŝ'; -- returns "0"

CREATE DATABASE mydb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
USE mydb;
CREATE TABLE t (c CHAR(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci);
INSERT INTO t VALUES ('ĝuo');
INSERT INTO t VALUES ('ŝuo');
SELECT * FROM t WHERE c='ŝuo'; -- 1 row returned

Now also try:
SHOW VARIABLES LIKE '%coll%';

/* where I have

Variable_name         Value            
--------------------  -----------------
collation_connection  utf8_general_ci  
collation_database    utf8_general_ci  
collation_server      utf8_general_ci  */

The culprit could be "collation_connection" setting

Peter
(not a MySQL/Oracle person)
[24 Sep 2012 7:37] Peter Laursen
I am using 64 bit Windows servers BTW.
[24 Sep 2012 13:09] MySQL Verification Team
Thank you for the bug report.

mysql>  SELECT * FROM t;
+------+
| c    |
+------+
| ĝuo |
| ŝuo |
+------+
2 rows in set (0.00 sec)

mysql>  SELECT * FROM t WHERE c='ŝuo';
+------+
| c    |
+------+
| ŝuo |
+------+
1 row in set (0.00 sec)

mysql> show variables like "%version%";
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| protocol_version        | 10                  |
| version                 | 5.1.67-debug-log    |
| version_comment         | Source distribution |
| version_compile_machine | x86_64              |
| version_compile_os      | unknown-linux-gnu   |
+-------------------------+---------------------+
5 rows in set (0.00 sec)