Bug #59670 unexpected results from query optimizer
Submitted: 22 Jan 2011 1:26 Modified: 25 Mar 2011 17:36
Reporter: Don Cohen Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.77-log OS:Linux
Assigned to: CPU Architecture:Any

[22 Jan 2011 1:26] Don Cohen
Description:
mysql> explain select * from iploc where ipend>=@ip order by ipend asc limit 4; 
+----+-------------+-------+-------+---------------+-------+---------+------+---------+-------------+ 
| id | select_type | table | type  | possible_keys | key   | key_len | ref  | rows    | Extra       | 
+----+-------------+-------+-------+---------------+-------+---------+------+---------+-------------+ 
|  1 | SIMPLE      | iploc | range | ipend         | ipend | 8       | NULL | 1172394 | Using where |  
+----+-------------+-------+-------+---------------+-------+---------+------+---------+-------------+ 
How can the optimizer expect to read a million rows when I asked for no more than 4 and there is no condition to satisfy other than the range?
I think the right answer should be 4.  In spite of that it seems to use the right algorithm, in that it takes almost no time.

On the other hand this one:
mysql> explain select * from iploc where ipend>=1234567890 and ipstart<=1234567890 order by ipend asc limit 4; 
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+ 
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows    | Extra                       | 
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+ 
|  1 | SIMPLE      | iploc | range | ipstart,ipend | ipstart | 8       | NULL | 1293964 | Using where; Using filesort |  
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+ 
executes like it really is reading a million rows.
The output above doesn't really tell me enough about the algorithm selected, but given that we want the results sorted by ipend I'd expect it to use the key ipend instead of ipstart.
Is there some way I can get it to use the ipend key and generate in ascending order? 
It looks to me like this should do that
explain select * from iploc where ipend>=@ip having ipstart<=@ip order by ipend asc limit 4; 
+----+-------------+-------+-------+---------------+-------+---------+------+---------+-------------+ 
| id | select_type | table | type  | possible_keys | key   | key_len | ref  | rows    | Extra       | 
+----+-------------+-------+-------+---------------+-------+---------+------+---------+-------------+ 
|  1 | SIMPLE      | iploc | range | ipend         | ipend | 8       | NULL | 1172394 | Using where |  
+----+-------------+-------+-------+---------------+-------+---------+------+---------+-------------+ 
In this case the execution time is still multiple seconds.  Given that leaving out the having clause executes in 0 sec I have trouble imagining what the algorithm is here.  Where can I find an explanation?

How to repeat:
CREATE TABLE iploc ( 
 ipstart bigint NOT NULL, 
 ipend bigint NOT NULL, 
 COUNTRY_CODE CHAR(2), 
 COUNTRY_NAME VARCHAR(64), 
 REGION VARCHAR(128), 
 CITY VARCHAR(128), 
  index (ipstart),  index (ipend) 
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

This table is for ip address location, each line is a range of ip addresses and a location.  There are about 3million lines but no overlapping ranges.
[22 Jan 2011 9:28] Valeriy Kravchuk
First of all, please, check if you get the same results with a recent version, 5.0.91. 

In any case, if you want specific index to be used you can try to force that:

explain select * from iploc FORCE INDEX(ipend) where ipend>=1234567890 and ipstart<=1234567890 order by ipend asc limit 4;
[23 Jan 2011 7:20] Don Cohen
> First of all, please, check if you get the same results with a recent 
> version, 5.0.91.
I don't see an easy way to do that.  
In any case some of my questions seem independent of version, such as how can I figure out what algorithm is used from the output of explain select, and am I correct that the number of rows in the first explain select should have been 4.

FORCE INDEX(ipend) is useful, thanks.
Also I now think that the other queries are correctly optimized.  When I change
the limit to 1 then I get the performance I expect (fast when using ipend index)
and it's asking for more than one that forces the rest of the rows to be examined.
[23 Jan 2011 11:05] Valeriy Kravchuk
As for estimated number of rows when LIMIT is used, this looks like bug #50168. I doubt this problem will be fixed any time soon.
[25 Mar 2011 17:36] Valeriy Kravchuk
Let's say this is a duplicate of bug #50168 (something to check again when that bug is fixed at least)