| Bug #30462 | Character sets: search failures with case sensitive collations | ||
|---|---|---|---|
| Submitted: | 16 Aug 2007 16:57 | Modified: | 29 Jul 2008 2:21 |
| Reporter: | Peter Gulutzan | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
| Version: | 6.0 | OS: | Linux (SUSE 10 64-bit) |
| Assigned to: | Alexander Barkov | CPU Architecture: | Any |
[16 Aug 2007 17:15]
MySQL Verification Team
Thank you for the bug report. Verified as described.
[4 Sep 2007 17:57]
Alexander Barkov
Checking latin2_czech_cs:
mysql> drop table if exists t1;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> create table t1 (s1 varchar(5) character set latin1 collate latin1_general_cs);
Query OK, 0 rows affected (0.00 sec)
mysql> insert into t1 values ('x'),('y'),('z'),('X'),('Y'),('Z');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from t1 order by s1;
+------+
| s1 |
+------+
| X |
| x |
| Y |
| y |
| Z |
| z |
+------+
6 rows in set (0.00 sec)
mysql> select * from t1 where s1 between 'x' and 'z';
+------+
| s1 |
+------+
| x |
| y |
| z |
| Y |
| Z |
+------+
5 rows in set (0.00 sec)
Works as expected. This collation is with "upper case preference",
so the value "X" is not returned, because it is smaller than 'x'.
I can't see any problems here.
[4 Sep 2007 17:58]
Alexander Barkov
Sorry, it should be "Checking latin1_general_cs" in the previous comment.
[4 Sep 2007 17:59]
Alexander Barkov
Step2. Checking latin2_czech_cs.
mysql> drop table t1;
Query OK, 0 rows affected (0.00 sec)
mysql> create table t1 (s1 varchar(5) character set latin2 collate latin2_czech_cs);
Query OK, 0 rows affected (0.00 sec)
mysql> insert into t1 values ('x'),('y'),('z'),('X'),('Y'),('Z');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from t1 order by s1;
+------+
| s1 |
+------+
| x |
| X |
| y |
| Y |
| z |
| Z |
+------+
6 rows in set (0.00 sec)
mysql> select * from t1 where s1 > 'z';
+------+
| s1 |
+------+
| Z |
+------+
1 row in set (0.00 sec)
This collation is of "lower case preference".
Only letter 'Z' is greater than 'z'.
There are no problems here.
[4 Sep 2007 18:01]
Alexander Barkov
Checking latin7_estonian_cs:
mysql> drop table t1;
Query OK, 0 rows affected (0.00 sec)
mysql> create table t1 (s1 varchar(5) character set latin7 collate latin7_estonian_cs);
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t1 values ('x'),('y'),('z'),('X'),('Y'),('Z');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from t1 order by s1;
+------+
| s1 |
+------+
| Z |
| z |
| X |
| x |
| Y |
| y |
+------+
6 rows in set (0.00 sec)
mysql> select * from t1 where s1 between 'x' and 'z';
Empty set (0.00 sec)
In Estonian, letter Z is sorted after S and before T.
http://www.omniglot.com/writing/estonian.htm
The ORDER BY result is correct, and thus nothing is between 'x' and 'z',
and the second SELECT correctly returns empty set.
No problems here.
[4 Sep 2007 18:07]
Alexander Barkov
Checking utf_general_cs:
mysql> drop table t1;
Query OK, 0 rows affected (0.00 sec)
mysql> create table t1 (s1 varchar(5) character set utf8 collate utf8_general_cs);
Query OK, 0 rows affected (0.00 sec)
mysql> insert into t1 values ('x'),('y'),('z'),('X'),('Y'),('Z');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from t1 order by s1;
+------+
| s1 |
+------+
| x |
| X |
| y |
| Y |
| z |
| Z |
+------+
6 rows in set (0.00 sec)
mysql> select * from t1 where s1 > 'Z';
+------+
| s1 |
+------+
| z |
+------+
1 row in set (0.00 sec)
According to the ORDER BY, the collation is of "lower case preference".
However the second SELECT returns a wrong result.
Another SELECT demonstrating the same problem:
mysql> select *, s1>'Z' from t1 order by s1;
+------+--------+
| s1 | s1>'Z' |
+------+--------+
| x | 0 |
| X | 0 |
| y | 0 |
| Y | 0 |
| z | 1 |
| Z | 0 |
+------+--------+
6 rows in set (0.00 sec)
utf8_general_cs definitely is broken.
All other collations work as expected.
[4 Sep 2007 18:13]
Alexander Barkov
Additional test with utf8_general_cs: mysql> alter table t1 add key(s1); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select *, s1>'Z' from t1 order by s1; +------+--------+ | s1 | s1>'Z' | +------+--------+ | X | 0 | | x | 0 | | Y | 0 | | y | 0 | | Z | 0 | | z | 1 | +------+--------+ 6 rows in set (0.00 sec) Now ORDER BY is of "upper case preference", and the order is consistent with s1>'Z' results.
[4 Sep 2007 23:48]
Peter Gulutzan
Thanks to the analysis and explanations of Alexander Barkov,
I now think that this summary is true:
The order of the letters XYZxyz differs depending on collation,
on indexing, and on operation. Using variations of this script:
drop table if exists t7;
create table t7 (s1 varchar(5) character set latin7
collate latin7_estonian_cs);
insert into t7 values ('X'),('x'),('Y'),('y'),('Z'),('z');
select * from t7 group by s1;
select * from t7 order by s1;
create index i7 on t7 (s1);
select * from t7 group by s1;
select * from t7 order by s1;
I see this:
latin1_general_cs, always: XxYyZz
latin2_czech_cs, if indexed: xXyYzZ
latin2_czech_cs, if unindexed: XxYyZz
latin7_estonian_cs, always: ZzXxYy
utf8_general_cs, if GROUP BY: XxYyZz
utf8_general_cs, if ORDER BY: xXyYzZ
The inconsistency of latin2_czech_cs doesn't seem to contradict the manual.
The inconsistency of utf8_general_cs doesn't matter, it's not officially supported.
If Mr Barkov confirms the above statements, then I believe he can
declare this "Not a Bug".
[5 Sep 2007 5:05]
Alexander Barkov
Hello Peter,
I could not verify the additional inconsistency with latin2_czech_cs.
It always returned the letters in this order: xXyXzZ
No matter, whether an index on the column s1 exists.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1 to server version: 5.1.21-beta-debug
...
mysql> create table t1 (s1 varchar(5) character set latin2 collate latin2_czech_cs);
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t1 values ('X'),('x'),('Y'),('y'),('Z'),('z');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from t1 group by s1;
+------+
| s1 |
+------+
| x |
| X |
| y |
| Y |
| z |
| Z |
+------+
6 rows in set (0.00 sec)
mysql> select * from t1 order by s1;
+------+
| s1 |
+------+
| x |
| X |
| y |
| Y |
| z |
| Z |
+------+
6 rows in set (0.00 sec)
mysql> create index i1 on t1 (s1);
Query OK, 6 rows affected (0.01 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from t1 group by s1;
+------+
| s1 |
+------+
| x |
| X |
| y |
| Y |
| z |
| Z |
+------+
6 rows in set (0.00 sec)
mysql> select * from t1 order by s1;
+------+
| s1 |
+------+
| x |
| X |
| y |
| Y |
| z |
| Z |
+------+
6 rows in set (0.00 sec)
Can you please confirm that you get another behavior.
If so, then this report needs additional attention.
If that was a mistake, and you get consistent order
both with and without index, feel free to close
this report as "not a bug".
Thanks!
[5 Sep 2007 13:39]
Peter Gulutzan
No mistake. But I was using MySQL Version 6.0.
Here is the script:
select version();
drop table if exists t7;
create table t7 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=myisam;
insert into t7 values ('X'),('Y'),('Z'),('x'),('y'),('z');
select * from t7 group by s1;
select * from t7 order by s1;
create index i on t7 (s1);
select * from t7 group by s1;
select * from t7 order by s1;
drop table if exists t7;
create table t7 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=innodb;
insert into t7 values ('X'),('Y'),('Z'),('x'),('y'),('z');
select * from t7 group by s1;
select * from t7 order by s1;
create index i on t7 (s1);
select * from t7 group by s1;
select * from t7 order by s1;
drop table if exists t7;
create table t7 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=falcon;
insert into t7 values ('X'),('Y'),('Z'),('x'),('y'),('z');
select * from t7 group by s1;
select * from t7 order by s1;
create index i on t7 (s1);
select * from t7 group by s1;
select * from t7 order by s1;
Summary results:
5.1 6.0
MyISAM, unindexed: xXyYzZ XxYyZz
MyISAM, indexed: xXyYzZ xXyYzZ
InnoDB, unindexed: xXyYzZ XxYyZz
InnoDB, indexed: xXyYzZ xXyYzZ
Falcon, unindexed: XxYyZz
Falcon, indexed: XxYyZz
5.1: MyISAM are always right. Falcon is not applicable.
6.0: MyISAM and InnoDB are wrong if indexed. Falcon is always wrong.
Detailed resuls:
mysql> select version();
+-------------------+
| version() |
+-------------------+
| 6.0.3-alpha-debug |
+-------------------+
1 row in set (0.00 sec)
mysql> drop table if exists t7;
Query OK, 0 rows affected (0.00 sec)
mysql> create table t7 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=myisam;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t7 values ('X'),('Y'),('Z'),('x'),('y'),('z');
Query OK, 6 rows affected (0.01 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from t7 group by s1;
+------+
| s1 |
+------+
| X |
| x |
| Y |
| y |
| Z |
| z |
+------+
6 rows in set (0.00 sec)
mysql> select * from t7 order by s1;
+------+
| s1 |
+------+
| X |
| x |
| Y |
| y |
| Z |
| z |
+------+
6 rows in set (0.01 sec)
mysql> create index i on t7 (s1);
Query OK, 6 rows affected (0.01 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from t7 group by s1;
+------+
| s1 |
+------+
| x |
| X |
| y |
| Y |
| z |
| Z |
+------+
6 rows in set (0.01 sec)
mysql> select * from t7 order by s1;
+------+
| s1 |
+------+
| x |
| X |
| y |
| Y |
| z |
| Z |
+------+
6 rows in set (0.00 sec)
mysql> drop table if exists t7;
Query OK, 0 rows affected (0.00 sec)
mysql> create table t7 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=innodb;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t7 values ('X'),('Y'),('Z'),('x'),('y'),('z');
Query OK, 6 rows affected (0.02 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from t7 group by s1;
+------+
| s1 |
+------+
| X |
| x |
| Y |
| y |
| Z |
| z |
+------+
6 rows in set (0.00 sec)
mysql> select * from t7 order by s1;
+------+
| s1 |
+------+
| X |
| x |
| Y |
| y |
| Z |
| z |
+------+
6 rows in set (0.01 sec)
mysql> create index i on t7 (s1);
Query OK, 6 rows affected (0.03 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from t7 group by s1;
+------+
| s1 |
+------+
| x |
| X |
| y |
| Y |
| z |
| Z |
+------+
6 rows in set (0.00 sec)
mysql> select * from t7 order by s1;
+------+
| s1 |
+------+
| x |
| X |
| y |
| Y |
| z |
| Z |
+------+
6 rows in set (0.00 sec)
mysql> drop table if exists t7;
Query OK, 0 rows affected (0.00 sec)
mysql> create table t7 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=falcon;
Query OK, 0 rows affected (0.03 sec)
mysql> insert into t7 values ('X'),('Y'),('Z'),('x'),('y'),('z');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from t7 group by s1;
+------+
| s1 |
+------+
| X |
| x |
| Y |
| y |
| Z |
| z |
+------+
6 rows in set (0.01 sec)
mysql> select * from t7 order by s1;
+------+
| s1 |
+------+
| X |
| x |
| Y |
| y |
| Z |
| z |
+------+
6 rows in set (0.00 sec)
mysql> create index i on t7 (s1);
Query OK, 6 rows affected (0.04 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from t7 group by s1;
+------+
| s1 |
+------+
| X |
| x |
| Y |
| y |
| Z |
| z |
+------+
6 rows in set (0.00 sec)
mysql> select * from t7 order by s1;
+------+
| s1 |
+------+
| X |
| x |
| Y |
| y |
| Z |
| z |
+------+
6 rows in set (0.00 sec)
[24 Jul 2008 12:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/50407 2745 Alexander Barkov 2008-07-24 Bug#30462 Character sets: search failures with case sensitive collations The fix for Bug # 33791 "Wrong ORDER BY with latin2_czech_cs" fixed this problem as well. Adding only tests. No code change required.
[28 Jul 2008 5:27]
Alexander Barkov
Pushed into mysql-6.0.7-bugteam
[28 Jul 2008 14:44]
Bugs System
Pushed into 6.0.7-alpha (revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (version source revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (pib:3)
[29 Jul 2008 2:21]
Paul DuBois
Noted in 6.0.7 changelog (same as entry for Bug#33791). ORDER BY failed to take into account accents and lettercases in multi-level collations (latin2_czech_cs and cp1250_czech_cs).
[14 Sep 2008 0:15]
Bugs System
Pushed into 6.0.6-alpha (revid:bar@mysql.com-20080724124527-nd973ewfxkitjp6n) (version source revid:lars-erik.bjork@sun.com-20080717135217-z3zlwwy465hmc20e) (pib:3)

Description: I create some tables with character columns. I always use a _cs (case sensitive) collation. I insert 'x','y','z','X','Y','Z'. With latin1_general_cs: 'Y' is between 'x' and 'z'. With latin2_czech_cs: 'Z' is greater than 'z' but 'Y' is not. With latin7_estonian_cs: nothing is between 'x' and 'z'. With utf8_general_cs, 'z' is greater than 'Z' but 'y' is not. How to repeat: mysql> create table t1 (s1 varchar(5) character set latin1 collate latin1_general_cs) engine=myisam; Query OK, 0 rows affected (0.01 sec) mysql> insert into t1 values ('x'),('y'),('z'),('X'),('Y'),('Z'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t1 where s1 between 'x' and 'z'; +------+ | s1 | +------+ | x | | y | | z | | Y | | Z | +------+ 5 rows in set (0.00 sec) mysql> mysql> create table t2 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=myisam; Query OK, 0 rows affected (0.00 sec) mysql> insert into t2 values ('x'),('y'),('z'),('X'),('Y'),('Z'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t2 where s1 > 'z'; +------+ | s1 | +------+ | Z | +------+ 1 row in set (0.00 sec) mysql> mysql> create table t7 (s1 varchar(5) character set latin7 collate latin7_estonian_cs) engine=myisam; Query OK, 0 rows affected (0.00 sec) mysql> insert into t7 values ('x'),('y'),('z'),('X'),('Y'),('Z'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t7 where s1 between 'x' and 'z'; Empty set (0.00 sec) mysql> mysql> create table tu (s1 varchar(5) character set utf8 collate utf8_general_cs) engine=myisam; Query OK, 0 rows affected (0.01 sec) mysql> insert into tu values ('x'),('y'),('z'),('X'),('Y'),('Z'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from tu where s1 > 'Z'; +------+ | s1 | +------+ | z | +------+ 1 row in set (0.00 sec)