MySQL Bugs: #23911: BETWEEN could be optimized as IN for constants

Bug #23911	BETWEEN could be optimized as IN for constants
Submitted:	3 Nov 2006 2:40	Modified:	28 Nov 2006 17:32
Reporter:	Peter Brodersen (Candidate Quality Contributor)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S4 (Feature request)
Version:	5.0.26-Debian_0.dotdeb.1-log	OS:	Linux (Linux)
Assigned to:		CPU Architecture:	Any
Tags:	qc

Description:
Currently BETWEEN is handled slightly different than a complete IN()-list.

But when the list is compared to an integer (field or constant) it could be an optimization to internally create a complete list of possible values.

How to repeat:
Create a table with two integer fields (a and b).
Create a key on (a,b).
Insert all possible values of (1..1000),(1..1000); one million rows in total.

mysql> EXPLAIN SELECT * FROM qt WHERE a BETWEEN 15 AND 17 AND b BETWEEN 80 AND 84\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: qt
         type: range
possible_keys: a
          key: a
      key_len: 10
          ref: NULL
         rows: 2037
        Extra: Using where; Using index
1 row in set (0.01 sec)

mysql> EXPLAIN SELECT * FROM qt WHERE a IN (15,16,17) AND b BETWEEN 80 AND 84\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: qt
         type: range
possible_keys: a
          key: a
      key_len: 10
          ref: NULL
         rows: 15
        Extra: Using where; Using index
1 row in set (0.01 sec)

As a is an integer the complete list could easily be generated as an optimization.

Another example:

mysql> SELECT BENCHMARK(1000,(SELECT COUNT(RAND()) FROM qt WHERE a BETWEEN 15 AND 17 AND b BETWEEN 80 AND 84));
..
1 row in set (12.54 sec)

mysql> SELECT BENCHMARK(1000,(SELECT COUNT(RAND()) FROM qt WHERE a IN(15,16,17) AND b BETWEEN 80 AND 84));
..
1 row in set (0.30 sec)

Suggested fix:
If feasible the mysql server should internally convert BETWEEN (and "a >= x AND a <= y" which behaves the same way as BETWEEN) to complete lists when the compared value is an integer.

There might be a distance limit for when the conversion might no longer be adequate. Furthermore I have no idea whether the overhead of that rewriting would be too costly.

Thank you for a problem report. Looks like a reasonable Optimizer feature request to me.