Bug #30462 | Character sets: search failures with case sensitive collations | ||
---|---|---|---|
Submitted: | 16 Aug 2007 16:57 | Modified: | 29 Jul 2008 2:21 |
Reporter: | Peter Gulutzan | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 6.0 | OS: | Linux (SUSE 10 64-bit) |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
[16 Aug 2007 16:57]
Peter Gulutzan
[16 Aug 2007 17:15]
MySQL Verification Team
Thank you for the bug report. Verified as described.
[4 Sep 2007 17:57]
Alexander Barkov
Checking latin2_czech_cs: mysql> drop table if exists t1; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> create table t1 (s1 varchar(5) character set latin1 collate latin1_general_cs); Query OK, 0 rows affected (0.00 sec) mysql> insert into t1 values ('x'),('y'),('z'),('X'),('Y'),('Z'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t1 order by s1; +------+ | s1 | +------+ | X | | x | | Y | | y | | Z | | z | +------+ 6 rows in set (0.00 sec) mysql> select * from t1 where s1 between 'x' and 'z'; +------+ | s1 | +------+ | x | | y | | z | | Y | | Z | +------+ 5 rows in set (0.00 sec) Works as expected. This collation is with "upper case preference", so the value "X" is not returned, because it is smaller than 'x'. I can't see any problems here.
[4 Sep 2007 17:58]
Alexander Barkov
Sorry, it should be "Checking latin1_general_cs" in the previous comment.
[4 Sep 2007 17:59]
Alexander Barkov
Step2. Checking latin2_czech_cs. mysql> drop table t1; Query OK, 0 rows affected (0.00 sec) mysql> create table t1 (s1 varchar(5) character set latin2 collate latin2_czech_cs); Query OK, 0 rows affected (0.00 sec) mysql> insert into t1 values ('x'),('y'),('z'),('X'),('Y'),('Z'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t1 order by s1; +------+ | s1 | +------+ | x | | X | | y | | Y | | z | | Z | +------+ 6 rows in set (0.00 sec) mysql> select * from t1 where s1 > 'z'; +------+ | s1 | +------+ | Z | +------+ 1 row in set (0.00 sec) This collation is of "lower case preference". Only letter 'Z' is greater than 'z'. There are no problems here.
[4 Sep 2007 18:01]
Alexander Barkov
Checking latin7_estonian_cs: mysql> drop table t1; Query OK, 0 rows affected (0.00 sec) mysql> create table t1 (s1 varchar(5) character set latin7 collate latin7_estonian_cs); Query OK, 0 rows affected (0.01 sec) mysql> insert into t1 values ('x'),('y'),('z'),('X'),('Y'),('Z'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t1 order by s1; +------+ | s1 | +------+ | Z | | z | | X | | x | | Y | | y | +------+ 6 rows in set (0.00 sec) mysql> select * from t1 where s1 between 'x' and 'z'; Empty set (0.00 sec) In Estonian, letter Z is sorted after S and before T. http://www.omniglot.com/writing/estonian.htm The ORDER BY result is correct, and thus nothing is between 'x' and 'z', and the second SELECT correctly returns empty set. No problems here.
[4 Sep 2007 18:07]
Alexander Barkov
Checking utf_general_cs: mysql> drop table t1; Query OK, 0 rows affected (0.00 sec) mysql> create table t1 (s1 varchar(5) character set utf8 collate utf8_general_cs); Query OK, 0 rows affected (0.00 sec) mysql> insert into t1 values ('x'),('y'),('z'),('X'),('Y'),('Z'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t1 order by s1; +------+ | s1 | +------+ | x | | X | | y | | Y | | z | | Z | +------+ 6 rows in set (0.00 sec) mysql> select * from t1 where s1 > 'Z'; +------+ | s1 | +------+ | z | +------+ 1 row in set (0.00 sec) According to the ORDER BY, the collation is of "lower case preference". However the second SELECT returns a wrong result. Another SELECT demonstrating the same problem: mysql> select *, s1>'Z' from t1 order by s1; +------+--------+ | s1 | s1>'Z' | +------+--------+ | x | 0 | | X | 0 | | y | 0 | | Y | 0 | | z | 1 | | Z | 0 | +------+--------+ 6 rows in set (0.00 sec) utf8_general_cs definitely is broken. All other collations work as expected.
[4 Sep 2007 18:13]
Alexander Barkov
Additional test with utf8_general_cs: mysql> alter table t1 add key(s1); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select *, s1>'Z' from t1 order by s1; +------+--------+ | s1 | s1>'Z' | +------+--------+ | X | 0 | | x | 0 | | Y | 0 | | y | 0 | | Z | 0 | | z | 1 | +------+--------+ 6 rows in set (0.00 sec) Now ORDER BY is of "upper case preference", and the order is consistent with s1>'Z' results.
[4 Sep 2007 23:48]
Peter Gulutzan
Thanks to the analysis and explanations of Alexander Barkov, I now think that this summary is true: The order of the letters XYZxyz differs depending on collation, on indexing, and on operation. Using variations of this script: drop table if exists t7; create table t7 (s1 varchar(5) character set latin7 collate latin7_estonian_cs); insert into t7 values ('X'),('x'),('Y'),('y'),('Z'),('z'); select * from t7 group by s1; select * from t7 order by s1; create index i7 on t7 (s1); select * from t7 group by s1; select * from t7 order by s1; I see this: latin1_general_cs, always: XxYyZz latin2_czech_cs, if indexed: xXyYzZ latin2_czech_cs, if unindexed: XxYyZz latin7_estonian_cs, always: ZzXxYy utf8_general_cs, if GROUP BY: XxYyZz utf8_general_cs, if ORDER BY: xXyYzZ The inconsistency of latin2_czech_cs doesn't seem to contradict the manual. The inconsistency of utf8_general_cs doesn't matter, it's not officially supported. If Mr Barkov confirms the above statements, then I believe he can declare this "Not a Bug".
[5 Sep 2007 5:05]
Alexander Barkov
Hello Peter, I could not verify the additional inconsistency with latin2_czech_cs. It always returned the letters in this order: xXyXzZ No matter, whether an index on the column s1 exists. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 to server version: 5.1.21-beta-debug ... mysql> create table t1 (s1 varchar(5) character set latin2 collate latin2_czech_cs); Query OK, 0 rows affected (0.01 sec) mysql> insert into t1 values ('X'),('x'),('Y'),('y'),('Z'),('z'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t1 group by s1; +------+ | s1 | +------+ | x | | X | | y | | Y | | z | | Z | +------+ 6 rows in set (0.00 sec) mysql> select * from t1 order by s1; +------+ | s1 | +------+ | x | | X | | y | | Y | | z | | Z | +------+ 6 rows in set (0.00 sec) mysql> create index i1 on t1 (s1); Query OK, 6 rows affected (0.01 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t1 group by s1; +------+ | s1 | +------+ | x | | X | | y | | Y | | z | | Z | +------+ 6 rows in set (0.00 sec) mysql> select * from t1 order by s1; +------+ | s1 | +------+ | x | | X | | y | | Y | | z | | Z | +------+ 6 rows in set (0.00 sec) Can you please confirm that you get another behavior. If so, then this report needs additional attention. If that was a mistake, and you get consistent order both with and without index, feel free to close this report as "not a bug". Thanks!
[5 Sep 2007 13:39]
Peter Gulutzan
No mistake. But I was using MySQL Version 6.0. Here is the script: select version(); drop table if exists t7; create table t7 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=myisam; insert into t7 values ('X'),('Y'),('Z'),('x'),('y'),('z'); select * from t7 group by s1; select * from t7 order by s1; create index i on t7 (s1); select * from t7 group by s1; select * from t7 order by s1; drop table if exists t7; create table t7 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=innodb; insert into t7 values ('X'),('Y'),('Z'),('x'),('y'),('z'); select * from t7 group by s1; select * from t7 order by s1; create index i on t7 (s1); select * from t7 group by s1; select * from t7 order by s1; drop table if exists t7; create table t7 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=falcon; insert into t7 values ('X'),('Y'),('Z'),('x'),('y'),('z'); select * from t7 group by s1; select * from t7 order by s1; create index i on t7 (s1); select * from t7 group by s1; select * from t7 order by s1; Summary results: 5.1 6.0 MyISAM, unindexed: xXyYzZ XxYyZz MyISAM, indexed: xXyYzZ xXyYzZ InnoDB, unindexed: xXyYzZ XxYyZz InnoDB, indexed: xXyYzZ xXyYzZ Falcon, unindexed: XxYyZz Falcon, indexed: XxYyZz 5.1: MyISAM are always right. Falcon is not applicable. 6.0: MyISAM and InnoDB are wrong if indexed. Falcon is always wrong. Detailed resuls: mysql> select version(); +-------------------+ | version() | +-------------------+ | 6.0.3-alpha-debug | +-------------------+ 1 row in set (0.00 sec) mysql> drop table if exists t7; Query OK, 0 rows affected (0.00 sec) mysql> create table t7 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=myisam; Query OK, 0 rows affected (0.01 sec) mysql> insert into t7 values ('X'),('Y'),('Z'),('x'),('y'),('z'); Query OK, 6 rows affected (0.01 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t7 group by s1; +------+ | s1 | +------+ | X | | x | | Y | | y | | Z | | z | +------+ 6 rows in set (0.00 sec) mysql> select * from t7 order by s1; +------+ | s1 | +------+ | X | | x | | Y | | y | | Z | | z | +------+ 6 rows in set (0.01 sec) mysql> create index i on t7 (s1); Query OK, 6 rows affected (0.01 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t7 group by s1; +------+ | s1 | +------+ | x | | X | | y | | Y | | z | | Z | +------+ 6 rows in set (0.01 sec) mysql> select * from t7 order by s1; +------+ | s1 | +------+ | x | | X | | y | | Y | | z | | Z | +------+ 6 rows in set (0.00 sec) mysql> drop table if exists t7; Query OK, 0 rows affected (0.00 sec) mysql> create table t7 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=innodb; Query OK, 0 rows affected (0.01 sec) mysql> insert into t7 values ('X'),('Y'),('Z'),('x'),('y'),('z'); Query OK, 6 rows affected (0.02 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t7 group by s1; +------+ | s1 | +------+ | X | | x | | Y | | y | | Z | | z | +------+ 6 rows in set (0.00 sec) mysql> select * from t7 order by s1; +------+ | s1 | +------+ | X | | x | | Y | | y | | Z | | z | +------+ 6 rows in set (0.01 sec) mysql> create index i on t7 (s1); Query OK, 6 rows affected (0.03 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t7 group by s1; +------+ | s1 | +------+ | x | | X | | y | | Y | | z | | Z | +------+ 6 rows in set (0.00 sec) mysql> select * from t7 order by s1; +------+ | s1 | +------+ | x | | X | | y | | Y | | z | | Z | +------+ 6 rows in set (0.00 sec) mysql> drop table if exists t7; Query OK, 0 rows affected (0.00 sec) mysql> create table t7 (s1 varchar(5) character set latin2 collate latin2_czech_cs) engine=falcon; Query OK, 0 rows affected (0.03 sec) mysql> insert into t7 values ('X'),('Y'),('Z'),('x'),('y'),('z'); Query OK, 6 rows affected (0.00 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t7 group by s1; +------+ | s1 | +------+ | X | | x | | Y | | y | | Z | | z | +------+ 6 rows in set (0.01 sec) mysql> select * from t7 order by s1; +------+ | s1 | +------+ | X | | x | | Y | | y | | Z | | z | +------+ 6 rows in set (0.00 sec) mysql> create index i on t7 (s1); Query OK, 6 rows affected (0.04 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from t7 group by s1; +------+ | s1 | +------+ | X | | x | | Y | | y | | Z | | z | +------+ 6 rows in set (0.00 sec) mysql> select * from t7 order by s1; +------+ | s1 | +------+ | X | | x | | Y | | y | | Z | | z | +------+ 6 rows in set (0.00 sec)
[24 Jul 2008 12:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/50407 2745 Alexander Barkov 2008-07-24 Bug#30462 Character sets: search failures with case sensitive collations The fix for Bug # 33791 "Wrong ORDER BY with latin2_czech_cs" fixed this problem as well. Adding only tests. No code change required.
[28 Jul 2008 5:27]
Alexander Barkov
Pushed into mysql-6.0.7-bugteam
[28 Jul 2008 14:44]
Bugs System
Pushed into 6.0.7-alpha (revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (version source revid:alik@mysql.com-20080725172155-fnc73o50e4tgl23k) (pib:3)
[29 Jul 2008 2:21]
Paul DuBois
Noted in 6.0.7 changelog (same as entry for Bug#33791). ORDER BY failed to take into account accents and lettercases in multi-level collations (latin2_czech_cs and cp1250_czech_cs).
[14 Sep 2008 0:15]
Bugs System
Pushed into 6.0.6-alpha (revid:bar@mysql.com-20080724124527-nd973ewfxkitjp6n) (version source revid:lars-erik.bjork@sun.com-20080717135217-z3zlwwy465hmc20e) (pib:3)