Bug #9693 bug in trying to join a value on a range (using indexes)
Submitted: 6 Apr 2005 19:29 Modified: 20 Jun 2005 5:38
Reporter: mysql prutser Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:3.23.41, 4.0.24, 4.1 OS:Linux (linux)
Assigned to: Sergey Petrunya CPU Architecture:Any

[6 Apr 2005 19:29] mysql prutser
Description:
I have a table with user data that contains a zipcode and a table with zipcode ranges and corresponding regions.
I want to join these two tables and this works, but an explain shows that mysql checks all 164 records in the zipcode table where it should only check 1, because of the indexes i placed. The ranges are all unique.

How to repeat:
CREATE TABLE data_test (
  id bigint(20) NOT NULL auto_increment,
  postcode varchar(7) default NULL,
  range smallint(6) default NULL,
  PRIMARY KEY  (id),
  KEY range (range)
) TYPE=MyISAM;

postcode being a zipcode including letters, range being only the numerical part of that zipcode

and another table with ranges for the numerical values of the zipcode with a corresponding region:

CREATE TABLE postcodes (
    postcode_id tinyint(3) unsigned NOT NULL auto_increment,
    range_from smallint(5) unsigned default NULL,
    range_till smallint(5) unsigned default NULL,
    regio tinyint(3) unsigned default NULL,
    streek char(25) default NULL,
    PRIMARY KEY (postcode_id),
    UNIQUE KEY range (range_from,range_till),
    UNIQUE KEY postcode_id (postcode_id),
    KEY rf (range_from),
    KEY rt (range_till),
    KEY regio (regio)
) TYPE=MyISAM;

range_from the lower value of the range and range_till the higher value.

In version 3.23.41 both 
this query (1):
explain SELECT
    COUNT(dm.id)
FROM data_main AS dm, postcodes AS p
WHERE  dm.range >= p.range_from AND dm.range <= p.range_till
and this query (2):
explain SELECT
    COUNT(dm.id)
FROM data_main AS dm, postcodes AS p
WHERE  dm.id=1
AND dm.range >= p.range_from AND dm.range <= p.range_till
result in mysql viewing all 164 records of the table postcodes

in 4.0.24, the first query results in the same explain, but the second one gets a bit better. mysql now checks only 4 records.
This query (3) however:
explain SELECT
    COUNT(dm.id)
FROM data_main AS dm, postcodes AS p
WHERE  dm.id IN (1,2)
AND dm.range >= p.range_from AND dm.range <= p.range_till
result in mysql viewing all 164 records of the table postcodes
again checks all records in the table postcodes

in 4.1 query no 2 still gets better: mysql now only checks 1 row. The 3rd one still checks all the records again.

Very strange behaviour

Suggested fix:
It should only check 1 row in the postcodes table when doing a join
[20 Jun 2005 5:36] Sergey Petrunya
The behavior you`ve described is expected, and is in line with 
documentation at 
http://dev.mysql.com/doc/mysql/en/range-optimization.html
http://dev.mysql.com/doc/mysql/en/explain.html

The difference between queries (2) and (3) is caused by different clauses on table data_main.
For query (2) the clause is dm.id=1.
data_main.id is a primary key, the optimizer figures out it can consider all dm.* column values to be constants, and the second part of the WHERE condition
(dm.range >= p.range_from AND dm.range <= p.range_till)
can be used to perform 'range' access to table postcodes. 

For query (3) dm.id can have 2 values, so dm.range is not constant, and 'range' access cannot be performed. 

What does EXPLAIN show in "Extra" field for table postcodes? If it is 
"range checked for each record", that means MySQL will use 
(dm.range >= p.range_from AND dm.range <= p.range_till)
to limit number rows it read from postcodes table and it is already as good
as it can be.
If not, you might try rewriting query (3) as a two-way UNION (one part with
"dm.id=1" and another with "dm.id=2")
[20 Jun 2005 5:38] Sergey Petrunya
Setting to "Not a bug" as the described behavior is expected for current versions of MySQL.