Bug #991 join is not optimised well if between is used
Submitted: 4 Aug 2003 11:31 Modified: 12 Oct 2004 14:09
Reporter: [ name withheld ] Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.0.14 and 4.1.0 OS:Mac OS X (Mac OS X 10.2)
Assigned to: Sergei Golubchik

[4 Aug 2003 11:31] [ name withheld ]
Description:
Select doesn't use the indexes as it could if you use 'between':

explain select * from token t1, token t2 where 
	t1.id between t2.id - 2 and t2.id - 1 and 
	t2.word = 3;
+-------+------+---------------+------+---------+-------+------+-------------+
| table | type | possible_keys | key  | key_len | ref   | rows | Extra       |
+-------+------+---------------+------+---------+-------+------+-------------+
| t2    | ref  | word          | word |       5 | const |    1 | Using where |
| t1    | ALL  | PRIMARY       | NULL |    NULL | NULL  |    3 | Using where |
+-------+------+---------------+------+---------+-------+------+-------------+

If you use constants for between the primary index is used:

explain select * from token where 
	id between 1 and 2;
+-------+-------+---------------+---------+---------+------+------+-------------
| table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+-------+-------+---------------+---------+---------+------+------+-------------
| token | range | PRIMARY       | PRIMARY |       4 | NULL |    1 | Using where |
+-------+-------+---------------+---------+---------+------+------+-------------

Remarkably if you define the word coloumn of the table as unique then the index is used:

explain select * from token t1, token t2 where 
	t1.id between t2.id - 2 and t2.id - 1 and 
	t2.word = 3;
+-------+-------+---------------+---------+---------+-------+------
| table | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+-------+-------+---------------+---------+---------+-------+------
| t2    | const | word,word_2   | word    |       5 | const |    1 |             |
| t1    | range | PRIMARY       | PRIMARY |       4 | NULL  |    1 | Using where |
+-------+-------+---------------+---------+---------+-------+------

It does not matter whether you use '... t1.id between ... and ...' or 't1.id >= ... and t1.id <= ...'
I'm using MyISAM tables and didn't yet check with other table types

Thanks a lot,
Martin

How to repeat:
-- create table

create table token (
             id integer,
             word integer,
             primary key(id),
             index(word)
         );

-- and fill it with data

insert into token values (1,1), (2,2), (3,3);

explain select * from token t1, token t2 where 
	t1.id between t2.id - 2 and t2.id - 1 and 
	t2.word = 3;

explain select * from token where 
	id between 1 and 2;
[7 Aug 2003 2:54] [ name withheld ]
It is very important for me that this issue is solved as it has a very big impact on the speed of my 
application -- it makes mysql more or less unusable for me. Any tips or workarounds would be 
appreciated. 

Tanks,
Martin.
[8 Aug 2003 9:10] Sergei Golubchik
related issue:

From: Georg Richter <georg@php.net>
Subject: diffrence: BETWEEN and range operators?!
Date: Thu, 7 Aug 2003 23:25:52 +0200

Hi,

I just noticed the following behaviour:

mysql> create table t (x int, y int, index(x), index(y));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t (x) values (1),(2),(3),(4),(5),(6),(7),(8),(9);
Query OK, 9 rows affected (0.01 sec)
Records: 9  Duplicates: 0  Warnings: 0

mysql> update t set y=x;
Query OK, 9 rows affected (0.00 sec)
Rows matched: 9  Changed: 9  Warnings: 0

mysql> explain select * from t t1, t t2 where t1.y = 2 and t2.x between 0 and t1.y;
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+  
| id | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra    |  
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+  
|  1 | SIMPLE      | t1    | ref  | y             | y    |       5 | const |  1   | Using where |  
|  1 | SIMPLE      | t2    | ALL  | x             | NULL |    NULL | NULL  |  9   | Using where |  
+----+-------------+-------+------+---------------+------+---------+-------+------+-------------+  
2 rows in set (0.00 sec)

mysql> explain select * from t t1, t t2 where t1.y = 2 and t2.x >= 0 and  t2.x <= t1.y;
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------------+ 
| id | select_type | table | type  | possible_keys | key  | key_len | ref   | rows | Extra       | 
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------------+ 
|  1 | SIMPLE      | t1    | ref   | y             | y    |       5 | const | 1    | Using where | 
|  1 | SIMPLE      | t2    | range | x             | x    |       5 | NULL  | 2    | Using where | 
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------------+ 

In first case (with BETWEEN) there is a full table scan, in 2nd there is a
range query - and I'm a little bit confused :(
Regards

Georg
[9 Aug 2003 0:13] Sergei Golubchik
both fixed in 4.0.15
[4 Nov 2003 8:00] [ name withheld ]
Hi,

I am sorry to say that the bug does not seem to be resolved. If you enter

explain select * from token t1, token t2 where 
        t1.id between t2.id - 1 and t2.id - 1 and 
        t2.word = 3; 

(from the example above)

you see that it now uses a range check:
+-------+------+---------------+------+---------+-------+------
+----------------------------------------------+
| table | type | possible_keys | key  | key_len | ref   | rows | Extra                                        |
+-------+------+---------------+------+---------+-------+------
+----------------------------------------------+
| t2    | ref  | word          | word |       5 | const |    1 | Using where                                  |
| t1    | ALL  | PRIMARY       | NULL |    NULL | NULL  |    3 | Range checked for each record (index 
map: 1) |
+-------+------+---------------+------+---------+-------+------
+----------------------------------------------+

But the question is: Why does it not use the primary key "id"??

Regards,
Martin.
[4 Nov 2003 8:03] [ name withheld ]
An addition to the comment above: I used 4.0.16-standard on Mac OS X 10.2

Martin.
[12 Nov 2003 1:41] Sergei Golubchik
exactly.
"Range checked for each record (index map: 1)" means that it uses id key.
The thing is that you use range with non-constant limits.
Generally, several ranges can be used in the query and several keys are available to choose from. As limits are not constants, MySQL cannot choose the best index in advance. Thus, when doing a join, for each value of the t2.id, MySQL checks range limits and data distribution to choose the best index *for this particular value of t2.id*. This is what "Range checked for each record" means. "Index map: 1" means that there is only one index to choose from (it's a bitmap, 1 for the first index, 2 - for the second, 3 - for both, 4 - for 3rd index, etc) - so MySQL will, indeed choose to use t1.id key, but it may also decide for a table scan if it would be fatser than using the index - it will depend on the range limits.
[7 Oct 2004 20:39] Vassili Gorshkov
I am seeing all sort of problems with JOIN statements that use BETWEEN constraint, e.g.:

1. 

CREATE TABLE R(
   r0 INT,
   r1 INT,
   r2 INT);

INSERT INTO R VALUES (1,5,7), (2,3,9);

CREATE TABLE T(
   f1 INT,
   f2 INT);
CREATE INDEX i1 ON T(f1,f2);

INSERT INTO T VALUES
   (1,1),
   (1,2),
   (1,3),
   (1,4),
   (1,5),
   (1,6),
   (1,7),
   (1,8),
   (1,9),
   (2,1),
   (2,2),
   (2,3),
   (2,4),
   (2,5),
   (2,6),
   (2,7),
   (2,8),
   (2,9);

EXPLAIN SELECT * from R,T WHERE f1 = r0 AND f2 BETWEEN r1 AND r2;

mysql> EXPLAIN SELECT * from R,T WHERE f1 = r0 AND f2 BETWEEN r1 AND r2;
+----+-------------+-------+------+---------------+------+---------+----------+------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref      | rows | Extra                    |
+----+-------------+-------+------+---------------+------+---------+----------+------+--------------------------+
|  1 | SIMPLE      | R     | ALL  | NULL          | NULL |    NULL | NULL     |    2 |                          |
|  1 | SIMPLE      | T     | ref  | i1            | i1   |       5 | vvg.R.r0 |    2 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+----------+------+--------------------------+
2 rows in set (0.00 sec)

mysql>

As you can see only the first field 'f1' in the index was used, while f2 was ignored.
[12 Oct 2004 14:09] Sergei Golubchik
last question moved to #5982