Bug #23911 BETWEEN could be optimized as IN for constants
Submitted: 3 Nov 2006 2:40 Modified: 28 Nov 2006 17:32
Reporter: Peter Brodersen (Candidate Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:5.0.26-Debian_0.dotdeb.1-log OS:Linux (Linux)
Assigned to: CPU Architecture:Any
Tags: qc

[3 Nov 2006 2:40] Peter Brodersen
Description:
Currently BETWEEN is handled slightly different than a complete IN()-list.

But when the list is compared to an integer (field or constant) it could be an optimization to internally create a complete list of possible values.

How to repeat:
Create a table with two integer fields (a and b).
Create a key on (a,b).
Insert all possible values of (1..1000),(1..1000); one million rows in total.

mysql> EXPLAIN SELECT * FROM qt WHERE a BETWEEN 15 AND 17 AND b BETWEEN 80 AND 84\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: qt
         type: range
possible_keys: a
          key: a
      key_len: 10
          ref: NULL
         rows: 2037
        Extra: Using where; Using index
1 row in set (0.01 sec)

mysql> EXPLAIN SELECT * FROM qt WHERE a IN (15,16,17) AND b BETWEEN 80 AND 84\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: qt
         type: range
possible_keys: a
          key: a
      key_len: 10
          ref: NULL
         rows: 15
        Extra: Using where; Using index
1 row in set (0.01 sec)

As a is an integer the complete list could easily be generated as an optimization.

Another example:

mysql> SELECT BENCHMARK(1000,(SELECT COUNT(RAND()) FROM qt WHERE a BETWEEN 15 AND 17 AND b BETWEEN 80 AND 84));
..
1 row in set (12.54 sec)

mysql> SELECT BENCHMARK(1000,(SELECT COUNT(RAND()) FROM qt WHERE a IN(15,16,17) AND b BETWEEN 80 AND 84));
..
1 row in set (0.30 sec)

Suggested fix:
If feasible the mysql server should internally convert BETWEEN (and "a >= x AND a <= y" which behaves the same way as BETWEEN) to complete lists when the compared value is an integer.

There might be a distance limit for when the conversion might no longer be adequate. Furthermore I have no idea whether the overhead of that rewriting would be too costly.
[28 Nov 2006 17:32] Valeriy Kravchuk
Thank you for a problem report. Looks like a reasonable Optimizer feature request to me.