MySQL Bugs: #66939: g-circumflex and s-circumflex considered the same character in utf8_unicode

Bug #66939	g-circumflex and s-circumflex considered the same character in utf8_unicode_ci
Submitted:	24 Sep 2012 1:18	Modified:	24 Sep 2012 13:09
Reporter:	Edoardo Nannotti	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S3 (Non-critical)
Version:	5.1.63	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
Two utf8 characters, "ĝ" (g-circumflex) and "ŝ" (s-circumflex), are considered to be the same when the collation is set to utf8_unicode_ci.
To fix it one must use utf8_bin.

How to repeat:
mysql> CREATE DATABASE mydb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
Query OK, 1 row affected (0.00 sec)

mysql> USE mydb;
Database changed

mysql> CREATE TABLE t (c CHAR(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci);
Query OK, 0 rows affected (0.11 sec)

mysql> INSERT INTO t VALUES ('ĝuo');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t VALUES ('ŝuo');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM t WHERE c='ŝuo';
+------+
| c    |
+------+
| ĝuo |
| ŝuo |
+------+
2 rows in set (0.00 sec)

Does not happen for me on neither 5.1.63 nor 5.5.23:

SET NAMES utf8 COLLATE utf8_unicode_ci;
SELECT 'ĝ' = 'ŝ'; -- returns "0"
SET NAMES utf8 COLLATE utf8_general_ci;
SELECT 'ĝ' = 'ŝ'; -- returns "0"

CREATE DATABASE mydb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
USE mydb;
CREATE TABLE t (c CHAR(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci);
INSERT INTO t VALUES ('ĝuo');
INSERT INTO t VALUES ('ŝuo');
SELECT * FROM t WHERE c='ŝuo'; -- 1 row returned

Now also try:
SHOW VARIABLES LIKE '%coll%';

/* where I have

Variable_name         Value            
--------------------  -----------------
collation_connection  utf8_general_ci  
collation_database    utf8_general_ci  
collation_server      utf8_general_ci  */

The culprit could be "collation_connection" setting

Peter
(not a MySQL/Oracle person)

I am using 64 bit Windows servers BTW.

Thank you for the bug report.

mysql>  SELECT * FROM t;
+------+
| c    |
+------+
| ĝuo |
| ŝuo |
+------+
2 rows in set (0.00 sec)

mysql>  SELECT * FROM t WHERE c='ŝuo';
+------+
| c    |
+------+
| ŝuo |
+------+
1 row in set (0.00 sec)

mysql> show variables like "%version%";
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| protocol_version        | 10                  |
| version                 | 5.1.67-debug-log    |
| version_comment         | Source distribution |
| version_compile_machine | x86_64              |
| version_compile_os      | unknown-linux-gnu   |
+-------------------------+---------------------+
5 rows in set (0.00 sec)