| Bug #70341 | Key lengths lead to incomplete result sets; unhinted scans prefer such indexes | ||
|---|---|---|---|
| Submitted: | 13 Sep 2013 19:00 | Modified: | 1 Nov 2013 9:45 |
| Reporter: | Arthur O'Dwyer | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
| Version: | 5.5.31, 5.6 | OS: | Any |
| Assigned to: | Jørgen Løland | CPU Architecture: | Any |
[13 Sep 2013 23:25]
MySQL Verification Team
Thank you for the bug report. c:\dbs>net start mysqld56 The MySQLD56 service is starting.. The MySQLD56 service was started successfully. c:\dbs>56 c:\dbs>c:\dbs\5.6\bin\mysql -uroot --port=3560 --prompt="mysql 5.6 > " Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.6.14 Source distribution Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql 5.6 > create database yy; Query OK, 1 row affected (0.02 sec) mysql 5.6 > use yy Database changed mysql 5.6 > create table t (id int, data text, index k3(data(3)), index k10(data(10))); Query OK, 0 rows affected (0.64 sec) mysql 5.6 > insert into t values (1,"banana1"); Query OK, 1 row affected (0.03 sec) mysql 5.6 > insert into t values (2,"banana2"); Query OK, 1 row affected (0.03 sec) mysql 5.6 > insert into t values (3,"banana3"); Query OK, 1 row affected (0.39 sec) mysql 5.6 > insert into t values (4,"banana4"); Query OK, 1 row affected (0.03 sec) mysql 5.6 > insert into t values (5,"banana5"); Query OK, 1 row affected (0.03 sec) mysql 5.6 > insert into t values (6,"banana6"); Query OK, 1 row affected (0.03 sec) mysql 5.6 > insert into t values (7,"banana7"); Query OK, 1 row affected (0.02 sec) mysql 5.6 > insert into t values (8,"banana8"); Query OK, 1 row affected (0.03 sec) mysql 5.6 > insert into t values (9,"banana9"); Query OK, 1 row affected (0.05 sec) mysql 5.6 > mysql 5.6 > select * from t where "banana3" < data and data < "banana7"; -- incorrectly returns the empty set Empty set (0.00 sec) mysql 5.6 > select * from t use index() where "banana3" < data and data < "banana7"; -- table scan produces correct results +------+---------+ | id | data | +------+---------+ | 4 | banana4 | | 5 | banana5 | | 6 | banana6 | +------+---------+ 3 rows in set (0.00 sec) mysql 5.6 > select * from t use index(k10) where "banana3" < data and data < "banana7"; -- Even choosing the other index would have produced correct results, in this case! +------+---------+ | id | data | +------+---------+ | 4 | banana4 | | 5 | banana5 | | 6 | banana6 | +------+---------+ 3 rows in set (0.00 sec) mysql 5.6 >
[2 Oct 2013 17:50]
Paul DuBois
Noted in 5.6.15, 5.7.3 changelogs. In some cases, range conditions over indexes defined on column prefixes returned incomplete result sets. (For example, SELECT ... WHERE 'abcdef1' < col_name AND col_name < 'abcdef9', where the index on col_name indexed only the first 6 characters.)
[2 Oct 2013 18:03]
Arthur O'Dwyer
@Paul: Does "Noted in x.y.z changelogs" mean that the bug has actually been fixed in version x.y.z? Same question in re bug number 69841. Lastly, does the absence of "5.5.z" in this bug's case mean that you're not fixing it for 5.5 at all, or just that the fix is still in progress? (Hoped-for answers: yes, yes, and in-progress.)
[3 Oct 2013 6:20]
Jørgen Løland
Arthur: Yes, "Noted in x.y.z changelogs" means that the bug has been fixed in these versions. The absence of 5.5.z means that it is not fixed there. There are no plans to fix this in MySQL 5.5.z as far as I know.
[4 Dec 2013 11:44]
Laurynas Biveinis
5.6$ bzr log -r 5479
------------------------------------------------------------
revno: 5479
committer: Jorgen Loland <jorgen.loland@oracle.com>
branch nick: mysql-5.6
timestamp: Mon 2013-09-30 12:27:07 +0200
message:
BUG#17458273: KEY LENGTHS LEAD TO INCOMPLETE RESULT SETS;
UNHINTED SCANS PREFER SUCH INDEXES
Consider a table with an index over a prefix of column 'x' and
a query with a range condition over 'x':
CREATE TABLE (... INDEX (x(4)))
SELECT ... WHERE 'foobar2' < x and x < 'foobar5';
The index will not be able to distinguish between rows with
x-values like "foobar", "foobaz" or "foobar3" since only the
first four characters ("foob") will be stored in the index.
Since all rows with an x-value prefix "foob" look equal, the
range optimizer has to translate '<' and '>' to '<=' and '>=',
respectively, for such indexes.
In this bug, only "'foobar2' < x" is correctly translated.
Later, when key_and() is executed on the the two ranges
"'foob' <= x" and "NULL < x < 'foob'", key_and() correctly
finds that these ranges together form an impossible condition.
The fix is to also translate '<' to '<=' for prefix indexes
so that key_and() can ultimately create the range
"'foob' <= x <= 'foob'".

Description: "Key length" just seems like a terrible idea in general. If you create an index on a TEXT or BLOB (or LONGTEXT or LONGBLOB) column, you must specify how many characters of the field are going to be indexed. Table lookups using that index will do the equivalent of strncmp() instead of strcmp(). Using strncmp means that range lookups with excluded endpoints ("const1" < mycol and mycol < "const2") will often spuriously return too few results. Worse --- and this is the real "bug", not just "misfeature" --- if your SELECT query isn't explicitly hinted, MySQL will often choose such an index in preference to a full table scan, yielding wrong results. Worser, MySQL will choose such an index even in preference to another *identical* index with a longer key length! How to repeat: create table t (id int, data text, index k3(data(3)), index k10(data(10))); insert into t values (1,"banana1"); insert into t values (2,"banana2"); insert into t values (3,"banana3"); insert into t values (4,"banana4"); insert into t values (5,"banana5"); insert into t values (6,"banana6"); insert into t values (7,"banana7"); insert into t values (8,"banana8"); insert into t values (9,"banana9"); select * from t where "banana3" < data and data < "banana7"; -- incorrectly returns the empty set select * from t use index() where "banana3" < data and data < "banana7"; -- table scan produces correct results select * from t use index(k10) where "banana3" < data and data < "banana7"; -- Even choosing the other index would have produced correct results, in this case! Suggested fix: Range scans using indexes with a "key length" should correctly strcmp the whole string near the endpoints of the scan (i.e., where strncmp would return 0). I don't plan to create a patch for this; the ball's in your court.