| Bug #39298 | Prefix key search returns wrong result for multi-level collations | ||
|---|---|---|---|
| Submitted: | 8 Sep 2008 6:42 | Modified: | 2 Mar 2009 12:50 |
| Reporter: | Alexander Barkov | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
| Version: | 4.0, 4.1, 5.0, 5.1, 6.0 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[8 Sep 2008 6:55]
Alexander Barkov
The same problem happens with another multi-level collation latin2_czech_cs.
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a varchar(10) character set latin2 collate latin2_czech_cs default NULL, KEY a (a(2)));
INSERT INTO t1 VALUES ('aaa'),('aab'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa');
INSERT INTO t1 VALUES ('aaa'),('aab'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa');
INSERT INTO t1 VALUES ('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa');
INSERT INTO t1 VALUES ('nnn'),('nnN'),('nna'),('nmn'),('nMl'),('nMn'),('nmL');
SELECT * FROM t1 WHERE a>'nMl'; -- returns 4 rows
SELECT * FROM t1 IGNORE KEY(a) WHERE a>'nMl'; -- returns 5 rows
Note, the value in WHERE condition to reproduce the failure is
slightly different from the value in cp1250_czech_cs test:
'nMl' vs 'nmL'.
This is because:
cp1250_czech_cs returns upper case first.
latin1_czech_cs returns lower case first.
See here for collation details:
http://www.collation-charts.org/mysql60/mysql604.latin2_czech_cs.html
http://www.collation-charts.org/mysql60/mysql604.cp1250_czech_cs.html
[8 Sep 2008 7:58]
Valeriy Kravchuk
Verified with 6.0.7 from bzr:
openxs@suse:/home2/openxs/dbs/6.0> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 6.0.7-alpha-debug Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> DROP TABLE IF EXISTS t1;
Query OK, 0 rows affected (0.06 sec)
mysql> CREATE TABLE t1 (a varchar(10) character set cp1250 collate cp1250_czech_cs default NULL,
-> KEY a (a(2)));
Query OK, 0 rows affected (0.02 sec)
mysql> INSERT INTO t1 VALUES
-> ('aaa'),('aab'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa');
Query OK, 10 rows affected (0.00 sec)
Records: 10 Duplicates: 0 Warnings: 0
mysql> INSERT INTO t1 VALUES
-> ('aaa'),('aab'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa');
Query OK, 10 rows affected (0.00 sec)
Records: 10 Duplicates: 0 Warnings: 0
mysql> INSERT INTO t1 VALUES
-> ('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa');
Query OK, 10 rows affected (0.00 sec)
Records: 10 Duplicates: 0 Warnings: 0
mysql> INSERT INTO t1 VALUES ('nnn'),('nnN'),('nna'),('nmn'),('nMl'),('nMn'),('nmL');
Query OK, 7 rows affected (0.00 sec)
Records: 7 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM t1 IGNORE KEY(a) WHERE a>'nmL';
+------+
| a |
+------+
| nnn |
| nnN |
| nna |
| nmn |
| nMn |
+------+
5 rows in set (0.01 sec)
mysql> SELECT * FROM t1 WHERE a>'nmL';
+------+
| a |
+------+
| nmn |
| nnn |
| nnN |
| nna |
+------+
4 rows in set (0.00 sec)
[8 Sep 2008 8:04]
Valeriy Kravchuk
5.1.30 and 5.0.70 from bzr are also affected.

Description: Optimizer filters out good values when using prefix keys for WHERE optimization with multi-level collations. How to repeat: DROP TABLE IF EXISTS t1; CREATE TABLE t1 (a varchar(10) character set cp1250 collate cp1250_czech_cs default NULL, KEY a (a(2))); INSERT INTO t1 VALUES ('aaa'),('aab'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'); INSERT INTO t1 VALUES ('aaa'),('aab'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'); INSERT INTO t1 VALUES ('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'),('aaa'); INSERT INTO t1 VALUES ('nnn'),('nnN'),('nna'),('nmn'),('nMl'),('nMn'),('nmL'); SELECT * FROM t1 IGNORE KEY(a) WHERE a>'nmL'; SELECT * FROM t1 WHERE a>'nmL'; The two SELECT statements return different row count: mysql> SELECT * FROM t1 IGNORE KEY(a) WHERE a>'nmL'; // correct result +------+ | a | +------+ | nnn | | nnN | | nna | | nmn | | nMn | +------+ 5 rows in set (0.00 sec) mysql> SELECT * FROM t1 WHERE a>'nmL'; // wrong result +------+ | a | +------+ | nmn | | nnn | | nnN | | nna | +------+ 4 rows in set (0.00 sec) Suggested fix: Explanation: =========== The above happens because in multi-level case sensitive collations the kind of the third letter ('n' vs 'l') has more importance that the case of the second letter ('M' vs 'm'). When we create a prefix index a(2), the order of key values in the index is different comparing to the order of the actual values: -- Compare three letters - actual values mysql> select strcmp(_cp1250 'nMn' collate cp1250_czech_cs, 'nmL'); +------------------------------------------------------+ | strcmp(_cp1250 'nMn' collate cp1250_czech_cs, 'nmL') | +------------------------------------------------------+ | 1 | +------------------------------------------------------+ 1 row in set (0.00 sec) -- Compare two letters - prefix key values mysql> select strcmp(_cp1250 'nM' collate cp1250_czech_cs, 'nm'); +----------------------------------------------------+ | strcmp(_cp1250 'nM' collate cp1250_czech_cs, 'nm') | +----------------------------------------------------+ | -1 | +----------------------------------------------------+ 1 row in set (0.00 sec) When processing a>'nmL', optimizer cuts the searched string to the length of the prefix index (2) and so it searches all keys greater than 'nm'. Therefore it skips the key 'nM' which is smaller than 'nm', and as a result it doesn't find the value 'nMn'. Suggested fix: ============== Optimizer should search keys starting from the smallest possible key. In the above example the smallest possible key is 'NM' instead of 'nm'. The following query simulates this approach and proves the idea: SELECT * FROM t1 WHERE a>'NM' AND LEFT(a,3) >'nmL'; It does use the index (see EXPLAIN SELECT output). It correctly returns all five good values.