Bug #30264 Two BETWEENs not using a composite index optimally - could be 3200x faster
Submitted: 7 Aug 2007 3:30 Modified: 7 Aug 2007 23:19
Reporter: Tasman Hayes Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:5.1.20 OS:Any
Assigned to: CPU Architecture:Any
Tags: between, composite index, index range scan, two betweens
Triage: Triaged: D5 (Feature request)

[7 Aug 2007 3:30] Tasman Hayes
Description:
45 million row MyISAM table in MySQL 5.1.20.

select * from M 
where B between 726642 and 733649 
and C between 8224 and 8230; 

Result set has 311 rows. 

Full table scan: 32.08 seconds.
Using index on (B): 37.90 seconds. 
Using index on (B,C): 39.75 seconds. 
If MySQL used the composite index optimally: 0.01 seconds. (3200x faster)

The long query time severely affects our production system as this 30+ second query make our webpage response times too long.

I'd hoped that MySQL would apply the composite index (B,C), with an index range scan on B, and a subindex range scan on C (for each matching B). But it seems not. The similarities between the response times for using the index on (B) and the index on (B,C), indicate MySQL is probably index range scanning on B, and then looking at every C element and applying a WHERE filter. That is, MySQL does not seem to take advantage of the fact that for each matching B, the C elements are available in the index in sorted order, and so could be subindex range scanned, rather than a full subindex scan. 

To check this, I collapsed the ranges, by making all the numbers in each range just the first number: 
UPDATE m SET b = 726642 WHERE b BETWEEN 726642 AND 733649; 
UPDATE m SET c = 8224 WHERE c BETWEEN 8224 AND 8230; 

Now I run the query as: select * from M where B = 726642 and C = 8224; 
This returns in 0.01 seconds. 

If the MySQL optimally used the composite index (B, C) to satisfy the two BETWEENs, the original query (with the BETWEENs) should return in the same time as the "equals" query - 0.01 seconds. 

How to repeat:
Create a table with three numeric columns, no unique columns.
Populate with number ranges (use three nested for loops in, say, MySQL stored procedure): for i = 1 to 360 { for j = 1 to 360 { for k = 1 to 360 ...
Create a composite index on the last two columns.
Run the query above with reasonable numbers for the test data above.

If MySQL is interested in resolving this bug, I can provide test data or a test script. (Email address in private comment section.)

Suggested fix:
Modify the MySQL optimizer to recognize this case and generate an appropriate execution plan.
[7 Aug 2007 4:29] Valeriy Kravchuk
Thank you for a problem report. It looks like a reasonable feature request for me. But it is not a bug, formally, as execution path you proposed is not described in the manual (and I do not know any RDBMS where it is implemented).
[7 Aug 2007 23:19] Tasman Hayes
Thank you for analyzing & responding to my report so quickly.
Thank you also for leaving my report in place as a feature request.
Much appreciated.

Does MySQL AB have a process in place for escalating feature requests?
Having this execution path in MySQL would dramatically increase our application performance.

Thank you again!
[7 Aug 2007 23:19] Tasman Hayes
Thank you for analyzing & responding to my report so quickly.
Thank you also for leaving my report in place as a feature request.
Much appreciated.

Does MySQL AB have a process in place for escalating feature requests?
Having this execution path in MySQL would dramatically increase our application performance.

Thank you again!