| Bug #66939 | g-circumflex and s-circumflex considered the same character in utf8_unicode_ci | ||
|---|---|---|---|
| Submitted: | 24 Sep 2012 1:18 | Modified: | 24 Sep 2012 13:09 |
| Reporter: | Edoardo Nannotti | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
| Version: | 5.1.63 | OS: | Linux |
| Assigned to: | CPU Architecture: | Any | |
[24 Sep 2012 7:35]
Peter Laursen
Does not happen for me on neither 5.1.63 nor 5.5.23:
SET NAMES utf8 COLLATE utf8_unicode_ci;
SELECT 'ĝ' = 'ŝ'; -- returns "0"
SET NAMES utf8 COLLATE utf8_general_ci;
SELECT 'ĝ' = 'ŝ'; -- returns "0"
CREATE DATABASE mydb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
USE mydb;
CREATE TABLE t (c CHAR(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci);
INSERT INTO t VALUES ('ĝuo');
INSERT INTO t VALUES ('ŝuo');
SELECT * FROM t WHERE c='ŝuo'; -- 1 row returned
Now also try:
SHOW VARIABLES LIKE '%coll%';
/* where I have
Variable_name Value
-------------------- -----------------
collation_connection utf8_general_ci
collation_database utf8_general_ci
collation_server utf8_general_ci */
The culprit could be "collation_connection" setting
Peter
(not a MySQL/Oracle person)
[24 Sep 2012 7:37]
Peter Laursen
I am using 64 bit Windows servers BTW.
[24 Sep 2012 13:09]
MySQL Verification Team
Thank you for the bug report. mysql> SELECT * FROM t; +------+ | c | +------+ | ĝuo | | ŝuo | +------+ 2 rows in set (0.00 sec) mysql> SELECT * FROM t WHERE c='ŝuo'; +------+ | c | +------+ | ŝuo | +------+ 1 row in set (0.00 sec) mysql> show variables like "%version%"; +-------------------------+---------------------+ | Variable_name | Value | +-------------------------+---------------------+ | protocol_version | 10 | | version | 5.1.67-debug-log | | version_comment | Source distribution | | version_compile_machine | x86_64 | | version_compile_os | unknown-linux-gnu | +-------------------------+---------------------+ 5 rows in set (0.00 sec)

Description: Two utf8 characters, "ĝ" (g-circumflex) and "ŝ" (s-circumflex), are considered to be the same when the collation is set to utf8_unicode_ci. To fix it one must use utf8_bin. How to repeat: mysql> CREATE DATABASE mydb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; Query OK, 1 row affected (0.00 sec) mysql> USE mydb; Database changed mysql> CREATE TABLE t (c CHAR(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci); Query OK, 0 rows affected (0.11 sec) mysql> INSERT INTO t VALUES ('ĝuo'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO t VALUES ('ŝuo'); Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM t WHERE c='ŝuo'; +------+ | c | +------+ | ĝuo | | ŝuo | +------+ 2 rows in set (0.00 sec)