Bug #50576 Optimizer failures when using BETWEEN x AND y on varchar column
Submitted: 25 Jan 2010 2:20 Modified: 16 Nov 2017 9:08
Reporter: Roel Van de Paar Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:5.0.22, 5.1.42, 5.5.0m2, 8.0.3 OS:Any
Assigned to: CPU Architecture:Any
Tags: regression

[25 Jan 2010 2:20] Roel Van de Paar
Description:
Two (seemingly) optimizer failures:

#1 EXPLAIN indicates all rows in a table (10) will be scanned, even though an index could be used if quotes (on varchar column) are auto-added.
#2 EXPLAIN indicates all rows in a table (10) will be scanned, when two records are right next to each other (aa/bb or 22/33=10 rows, aa/cc or 22/44=3 rows), even though an index could be used.

Issues 1 and 2 can be combined when using no quotes and using adjacent records.

(Same results on MyISAM and InnoDB)

mysql> EXPLAIN SELECT * FROM simple WHERE test BETWEEN 'aa' and 'bb'; /* Highlights issue #2 */
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | simple | index | test          | test | 5       | NULL |   10 | Using where; Using index |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM simple WHERE test BETWEEN 'aa' and 'cc';
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | simple | range | test          | test | 5       | NULL |    3 | Using where; Using index |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM simple WHERE test BETWEEN '22' and '33'; /* Highlights issue #2 */
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | simple | index | test          | test | 5       | NULL |   10 | Using where; Using index |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM simple WHERE test BETWEEN '22' and '44';
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | simple | range | test          | test | 5       | NULL |    3 | Using where; Using index |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN EXTENDED SELECT * FROM simple WHERE test BETWEEN '22' and '33'; /* Highlights issues #1&2 combined, notice filtered output */
+----+-------------+--------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | simple | index | test          | test | 5       | NULL |   10 |    20.00 | Using where; Using index |
+----+-------------+--------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
+----+-------------+--------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

+-------+------+----------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                              |
+-------+------+----------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `roelt`.`simple`.`test` AS `test` from `roelt`.`simple` where (`roelt`.`simple`.`test` between '22' and '33') |
+-------+------+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

How to repeat:
DROP TABLE IF EXISTS simple;
CREATE TABLE `simple` (`test` varchar(2), KEY `test` (`test`));
INSERT INTO simple VALUES ('aa'),('bb'),('cc'),('dd'),('ee'),(11),(22),(33),(44),(55);
EXPLAIN EXTENDED SELECT * FROM simple WHERE test BETWEEN '22' and '44'; SHOW WARNINGS;
EXPLAIN EXTENDED SELECT * FROM simple WHERE test BETWEEN 22 and 44; SHOW WARNINGS; /* Highlights issue #1 */
EXPLAIN SELECT * FROM simple WHERE test BETWEEN 'aa' and 'bb'; /* Highlights issue #2 */
EXPLAIN SELECT * FROM simple WHERE test BETWEEN 'aa' and 'cc'; 
EXPLAIN SELECT * FROM simple WHERE test BETWEEN '22' and '33'; /* Highlights issue #2 */
EXPLAIN SELECT * FROM simple WHERE test BETWEEN '22' and '44';
EXPLAIN EXTENDED SELECT * FROM simple WHERE test BETWEEN '22' and '33'; /* Highlights issues #1&2 combined, notice filtered output */
[25 Jan 2010 2:36] Roel Van de Paar
For issue #1, I am aware of this:

http://dev.mysql.com/doc/refman/5.1/en/type-conversion.html
'If both arguments in a comparison operation are strings, they are compared as strings.'
'In all other cases, the arguments are compared as floating-point (real) numbers.'

However, at the very least #1 could be a feature request to modify this behavior. Likely users are unknowingly doing table scans where it's not necessary. 

The same page also states (at the top):
'When an operator is used with operands of different types, type conversion occurs to make the operands compatible. Some conversions occur implicitly. For example, MySQL automatically converts numbers to strings as necessary, and vice versa.'

So, there are already some implicit number>string conversions happening. Why not have the optimizer do the same to avoid tablescans?

As for issue #2, this looks like a bug.
[25 Jan 2010 2:54] Roel Van de Paar
Small correction to last line of testcase. The 10 rows are scanned here because of adjecent records, not because of missing quotes. Still interesting to note the filtered: 20.00. When combining issues #1 and #2, the issue remains the same (tablescan).
[25 Jan 2010 3:02] Roel Van de Paar
Verifying as D2. Same on 5.5.0m2.

Also looks like a regression for issue #2: Results on 4.1.25-pro:

--------
mysql> EXPLAIN SELECT * FROM simple WHERE test BETWEEN 'aa' and 'bb';
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | simple | range | test          | test |       3 | NULL |    2 | Using where; Using index |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.02 sec)

mysql> EXPLAIN SELECT * FROM simple WHERE test BETWEEN 'aa' and 'cc'; 
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | simple | range | test          | test |       3 | NULL |    3 | Using where; Using index |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM simple WHERE test BETWEEN '22' and '33';
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | simple | range | test          | test |       3 | NULL |    2 | Using where; Using index |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM simple WHERE test BETWEEN '22' and '44';
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | simple | range | test          | test |       3 | NULL |    3 | Using where; Using index |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
--------

Issue #2 present in versions: 5.0.22
Issue #2 not present in versions: 4.1.25 / 4.0.30
[25 Jan 2010 3:48] Roel Van de Paar
Very interesting related bug: bug #14220
[27 Jan 2010 22:35] Roel Van de Paar
New testcase:

-------------
DROP TABLE IF EXISTS simple; DROP TABLE IF EXISTS simple2; DROP TABLE IF EXISTS simple3;
CREATE TABLE `simple` (`test` varchar(2), KEY `test` (`test`));
INSERT INTO simple VALUES ('aa'),('bb'),('cc'),('dd'),('ee'),(11),(22),(33),(44),(55);
CREATE TABLE `simple2` (`test` varchar(2), KEY `test` (`test`));
INSERT INTO simple2 SELECT * from simple;  INSERT INTO simple2 SELECT * from simple2;
INSERT INTO simple2 SELECT * from simple2; INSERT INTO simple2 SELECT * from simple2;
INSERT INTO simple2 SELECT * from simple2; INSERT INTO simple2 SELECT * from simple2;
INSERT INTO simple2 SELECT * from simple2; INSERT INTO simple2 SELECT * from simple2;
INSERT INTO simple2 SELECT * from simple2; INSERT INTO simple2 SELECT * from simple2;
INSERT INTO simple2 SELECT * from simple2; INSERT INTO simple2 SELECT * from simple2;
INSERT INTO simple2 SELECT * from simple2; INSERT INTO simple2 SELECT * from simple2;
INSERT INTO simple2 SELECT * from simple2; INSERT INTO simple2 SELECT * from simple2;
CREATE TABLE `simple3` (`test` varchar(2), KEY `test` (`test`));
INSERT INTO simple3 SELECT * from simple;
INSERT INTO simple3 SELECT * from simple;
DELETE FROM simple3 LIMIT 9;

/* At this point we have 10 rows in simple, 327680 rows in simple2 and 11 rows in simple3 */

EXPLAIN EXTENDED SELECT * FROM simple2 WHERE test BETWEEN '22' and '44'; SHOW WARNINGS;
EXPLAIN EXTENDED SELECT * FROM simple2 WHERE test BETWEEN 22 and 44; SHOW WARNINGS; /* Issue #1 happens on any number of rows */

EXPLAIN SELECT * FROM simple WHERE test BETWEEN 'aa' and 'cc';
EXPLAIN SELECT * FROM simple WHERE test BETWEEN 'aa' and 'bb'; /* Issue #2 happens on 10 rows */

EXPLAIN SELECT * FROM simple3 WHERE test BETWEEN 'aa' and 'bb'; /* Issue #2 does not happen on 11 rows */

EXPLAIN SELECT * FROM simple JOIN simple2 WHERE simple.test BETWEEN 'aa' and 'bb'; /* Issue #2 happens on large join for small first <10 rows table */

/* Summary:

   Issue #1: issue is always there, even on large tables. Optimizer should auto-add quotes to avoid large scans

   Issue #2: though this issues is not seen in simple queries with > 10 rows (it is seen with <= 10 rows), 
   it becomes much more apparent on JOIN's that involve a <=10 rows table. 

   Result for issue #2 in the above testcase: 
   EXPLAIN shows 10x327680=3276800 rows will be scanned instead of 2 (adjacent rows)x327680=655360 rows (400% increase) */
-------------
[27 Jan 2010 22:35] Roel Van de Paar
Results: 

-------------
mysql> /* At this point we have 10 rows in simple, 327680 rows in simple2 and 11 rows in simple3 */

mysql> EXPLAIN EXTENDED SELECT * FROM simple2 WHERE test BETWEEN '22' and '44'; SHOW WARNINGS;
+----+-------------+---------+-------+---------------+------+---------+------+-------+----------+--------------------------+
| id | select_type | table   | type  | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+---------+-------+---------------+------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | simple2 | range | test          | test | 5       | NULL | 85671 |   100.00 | Using where; Using index | << Index is actually used
+----+-------------+---------+-------+---------------+------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

+-------+------+-------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                 |
+-------+------+-------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `roelt`.`simple2`.`test` AS `test` from `roelt`.`simple2` where (`roelt`.`simple2`.`test` between '22' and '44') |
+-------+------+-------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN EXTENDED SELECT * FROM simple2 WHERE test BETWEEN 22 and 44; SHOW WARNINGS; /* Issue #1 happens on any number of rows */
+----+-------------+---------+-------+---------------+------+---------+------+--------+----------+--------------------------+
| id | select_type | table   | type  | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+---------+-------+---------------+------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE      | simple2 | index | test          | test | 5       | NULL | 327680 |   100.00 | Using where; Using index | << All rows are scanned
+----+-------------+---------+-------+---------------+------+---------+------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

+-------+------+---------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                             |
+-------+------+---------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `roelt`.`simple2`.`test` AS `test` from `roelt`.`simple2` where (`roelt`.`simple2`.`test` between 22 and 44) | << Missing quotes
+-------+------+---------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM simple WHERE test BETWEEN 'aa' and 'cc';
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | simple | range | test          | test | 5       | NULL |    3 | Using where; Using index | << What it should be, issue did not happen (records not adjacent)
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM simple WHERE test BETWEEN 'aa' and 'bb'; /* Issue #2 happens on 10 rows */
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | simple | index | test          | test | 5       | NULL |   10 | Using where; Using index | << Adjacent records (aa/bb) cause all rows to be scanned
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM simple3 WHERE test BETWEEN 'aa' and 'bb'; /* Issue #2 does not happen on 11 rows */
+----+-------------+---------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table   | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+---------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | simple3 | range | test          | test | 5       | NULL |    1 | Using where; Using index | << 1 Row is actually not possible, as the result output is two rows, but this is less important
+----+-------------+---------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM simple JOIN simple2 WHERE simple.test BETWEEN 'aa' and 'bb'; /* Issue #2 happens on large join for small first <10 rows table */
+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------------+
| id | select_type | table   | type  | possible_keys | key  | key_len | ref  | rows   | Extra                          |
+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------------+
|  1 | SIMPLE      | simple  | index | test          | test | 5       | NULL |     10 | Using where; Using index       | << 10 rows are being scanned while the result set is only 2 !
|  1 | SIMPLE      | simple2 | index | NULL          | test | 5       | NULL | 327680 | Using index; Using join buffer | << This multiples with the # of records in table 2
+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------------+
2 rows in set (0.00 sec)
-------------

As such, issue #2 is a substantial regression from 4.1 (EXPLAIN shows a 400% increase in the number of rows read on joins involving a small table <=10 rows).

And, it would be good if issue #1 is fixed (possibly at the same time?) so that large scans are avoided where possible. Also see other bug #14220.

4.1 Results of testcase for issue #2 (issue #1 was already present in 4.1 as well):

-------------
mysql> EXPLAIN SELECT * FROM simple JOIN simple2 WHERE simple.test BETWEEN 'aa' and 'bb';
+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type | table   | type  | possible_keys | key  | key_len | ref  | rows   | Extra                    |
+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | simple  | range | test          | test |       3 | NULL |      2 | Using where; Using index | /* Correct, notice type */
|  1 | SIMPLE      | simple2 | index | NULL          | test |       3 | NULL | 327680 | Using index              |
+----+-------------+---------+-------+---------------+------+---------+------+--------+--------------------------+
2 rows in set (0.00 sec)
-------------
[29 Jan 2010 22:46] Omer Barnir
1) optimizer can't auto-quote, without changing the meaning of what is being requested
2) with case of < 10 rows, the table scan is preferable; if there is a demonstrated loss of actual query performance, then it may need extra consideration just from the explain output, there's no indication that this is incorrect
[16 Nov 2017 8:47] Roel Van de Paar
I am not sure why this was closed? It looks like there is a clear demonstrated loss of performance?

For example issue #1;

8.0.3>SELECT * FROM simple2 WHERE test BETWEEN '22' and '44';
[...]
98304 rows in set (1.57 sec)

8.0.3>SELECT * FROM simple2 WHERE test BETWEEN 22 and 44;
[...]
98304 rows in set (4.29 sec)
[16 Nov 2017 9:08] Roel Van de Paar
Missed that #1 Point by Omer makes sense; optimiser cannot change meaning.