Bug #34096 problem with cyrillic letter IO and letter case issue in utf8_general_ci
Submitted: 27 Jan 2008 22:26 Modified: 28 Jan 2008 22:02
Reporter: Vsevolod Parfenov Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0.32-Debian_7etch1-log OS:Any
Assigned to: CPU Architecture:Any
Tags: cyrillic io ё utf8

[27 Jan 2008 22:26] Vsevolod Parfenov
Description:
Better to view it on a wide screen.

It's all about cyrillic characters:
Ёё comparing to Ее and
Йй comparing to Ии

characters (first code is cp1251):
е 0xE5 = U+0435 : CYRILLIC SMALL LETTER IE
Е 0xC5 = U+0415 : CYRILLIC CAPITAL LETTER IE
ё 0xB8 = U+0451 : CYRILLIC SMALL LETTER IO
Ё 0xA8 = U+0401 : CYRILLIC CAPITAL LETTER IO
и 0xE8 = U+0438 : CYRILLIC SMALL LETTER I
И 0xC8 = U+0418 : CYRILLIC CAPITAL LETTER I
й 0xE9 = U+0439 : CYRILLIC SMALL LETTER SHORT I
Й 0xC9 = U+0419 : CYRILLIC CAPITAL LETTER SHORT I 

from http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT

I compare several collations in cp1251 and utf8 character sets.
SQL SELECT looks like (cp1251 hex codes):
select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
        B8    A8   E9    C9   7A    5A   E5    B8   E8    E9   C8    C9   E5     B8   C5    A8   E8    E9   A8    B8   C9    E9   C1    E1   4C    6C

Compare case sensitive collations
---------------------------------

mysql> set names cp1251 collate cp1251_general_cs;
mysql> select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 'ё' = 'Ё' | 'й' = 'Й' | 'z' = 'Z' | 'е' = 'ё' | 'и' = 'й' | 'И' = 'Й' | 'е' >= 'ё' | 'Е' = 'Ё' | 'и' > 'й' | 'Ё' > 'ё' | 'Й' > 'й' | 'Б' > 'б' | 'L' > 'l' |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
|         0 |         0 |         0 |         0 |         0 |         0 |          0 |         0 |         0 |         0 |         0 |         0 |         0 |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
Correct

mysql> set names utf8 collate utf8_bin;
mysql> select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 'ё' = 'Ё' | 'й' = 'Й' | 'z' = 'Z' | 'е' = 'ё' | 'и' = 'й' | 'И' = 'Й' | 'е' >= 'ё' | 'Е' = 'Ё' | 'и' > 'й' | 'Ё' > 'ё' | 'Й' > 'й' | 'Б' > 'б' | 'L' > 'l' |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
|         0 |         0 |         0 |         0 |         0 |         0 |          1 |         0 |         0 |         0 |         0 |         0 |         0 |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
Wrong!

The 1 (true value) in second result is a error! It has to be 0 (false) (sorting issue)

Compare case insensitive collations
-----------------------------------

mysql> set names cp1251 collate cp1251_general_ci;
mysql> select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 'ё' = 'Ё' | 'й' = 'Й' | 'z' = 'Z' | 'е' = 'ё' | 'и' = 'й' | 'И' = 'Й' | 'е' >= 'ё' | 'Е' = 'Ё' | 'и' > 'й' | 'Ё' > 'ё' | 'Й' > 'й' | 'Б' > 'б' | 'L' > 'l' |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
|         1 |         1 |         1 |         0 |         0 |         0 |          0 |         0 |         0 |         0 |         0 |         0 |         0 |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
Correct

mysql> set names utf8 collate utf8_general_ci;
mysql> select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 'ё' = 'Ё' | 'й' = 'Й' | 'z' = 'Z' | 'е' = 'ё' | 'и' = 'й' | 'И' = 'Й' | 'е' >= 'ё' | 'Е' = 'Ё' | 'и' > 'й' | 'Ё' > 'ё' | 'Й' > 'й' | 'Б' > 'б' | 'L' > 'l' |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
|         0 |         0 |         1 |         0 |         0 |         0 |          1 |         0 |         0 |         0 |         0 |         0 |         0 |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
Wrong in columns (starting from 1): 1, 2, 7. First 3 columns have to be 1 and the rest - 0

mysql> set names utf8 collate utf8_unicode_ci;
mysql> select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
| 'ё' = 'Ё' | 'й' = 'Й' | 'z' = 'Z' | 'е' = 'ё' | 'и' = 'й' | 'И' = 'Й' | 'е' >= 'ё' | 'Е' = 'Ё' | 'и' > 'й' | 'Ё' > 'ё' | 'Й' > 'й' | 'Б' > 'б' | 'L' > 'l' |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
|         1 |         1 |         1 |         1 |         1 |         1 |          1 |         1 |         0 |         0 |         0 |         0 |         0 |
+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+
Wrong in columns (starting from 1): 4, 5, 6, 7, 8. First 3 columns have to be 1 and the rest - 0

How to repeat:
set names cp1251 collate cp1251_general_cs;
select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
set names utf8 collate utf8_bin;
select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
set names cp1251 collate cp1251_general_ci;
select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
set names utf8 collate utf8_general_ci;
select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';
set names utf8 collate utf8_unicode_ci;
select 'ё' = 'Ё', 'й' = 'Й', 'z' = 'Z', 'е' = 'ё', 'и' = 'й', 'И' = 'Й', 'е' >= 'ё', 'Е' = 'Ё', 'и' > 'й', 'Ё' > 'ё', 'Й' > 'й', 'Б' > 'б', 'L' > 'l';

Suggested fix:
Fix collations utf8_bin, utf8_general_ci, utf8_unicode_ci
[27 Jan 2008 22:30] Vsevolod Parfenov
Bug description as a file for convenient view

Attachment: bug_description.txt (text/plain), 6.37 KiB.

[28 Jan 2008 7:57] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior with current version. Please upgrade.
[28 Jan 2008 22:02] Vsevolod Parfenov
I've updated to last available version: "5.0.32-Debian_7etch5-log Debian etch distribution"exitexit

problem still not solved!
[28 Jan 2008 22:21] Sveta Smirnova
Please update to last version.

This is 5.0.51 located at http://dev.mysql.com/downloads/. I believe Debian should provide 5.0.51 packages too.
[10 Jun 2010 10:27] Pavel Zheltouhov
script in cp1251

Attachment: select_CI_cp1251.sql (application/octet-stream, text), 200 bytes.

[10 Jun 2010 10:27] Pavel Zheltouhov
script in utf8

Attachment: select_CI_utf8.sql (application/octet-stream, text), 221 bytes.

[10 Jun 2010 10:31] Pavel Zheltouhov
I attach two files for easy reproduce this bug.
I have  5.1.43-community-log MySQL Community Server (GPL) and i repeat this bug.

Don't know is this bug or feature : 
If server change any behavior existing appications can be affected.
[10 Jun 2010 21:01] Sveta Smirnova
Pavel,

in your case this is not a bug. See http://www.collation-charts.org/mysql60/mysql604.utf8_unicode_ci.european.html for details.
[4 Jun 2019 7:49] Hurelhuyag M
I have same problem here. Why no one understanding this problem? 
Good news is I found workaround. create custom charset with reset rule.
Like this:
https://gist.github.com/hurelhuyag/5cb4af7547e95e06ea88eac817524f51