Bug #29433 composite index not used
Submitted: 29 Jun 2007 0:03 Modified: 30 Nov 2009 18:58
Reporter: Jacek Becla Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.27, 5.0, 5.1, 4.1, 5.2 BK, 6.0.14-bzr OS:Any
Assigned to: CPU Architecture:Any
Tags: composite index

[29 Jun 2007 0:03] Jacek Becla
Description:
The issue arises when we try to implement the cross matching algorithm from the following paper: MSR-TR-2006-52 "The Zones Algorithm for Finding Points-Near-a-Point or Cross-Matching Spatial Datasets", Gray, Jim; Nieto-Santisteban, Maria A.; Szalay, Alexander S.

The composite index is not used as expected. More details in the 'how to repeat' section.

How to repeat:
The schema files and the files with sample data can be downloaded from
http://www.slac.stanford.edu/~becla/mysql_bug/ 
(I don't want to dump 100+MB file into your ftp...)

Here are the variants of the core crossmatch query that were tried, along with explain extended output

1)
INSERT INTO LooseMatch
    SELECT p.id, s.id FROM InMemoryDIASource AS p
    INNER JOIN ZoneZone AS zz FORCE INDEX (PRIMARY) ON p.zoneId = zz.zoneId
    INNER JOIN InMemoryObject AS s FORCE INDEX (idx_zone_ra) ON zz.matchZoneId = s.zoneId
    WHERE s.ra BETWEEN p.ra - zz.deltaRa AND p.ra + zz.deltaRa
    AND s.decl BETWEEN p.decl - 0.000833 AND p.decl + 0.000833
    AND POW(p.x - s.x, 2) + POW(p.y - s.y, 2) + POW(p.z - s.z, 2) < 2.1137067679466e-10

40% done after 10 hours. EXPLAIN EXTENDED reports:

select `p`.`id` AS `id`,`s`.`id` AS `id`
  from `InMemoryDIASource` `p` join `ZoneZone` `zz` join `InMemoryObject` `s` 
  where ((`zz`.`zoneId` = `p`.`zoneId`) and 
         (`zz`.`matchZoneId` = `s`.`zoneId`) and 
         (`s`.`ra` between (`p`.`ra` - `zz`.`deltaRa`) and (`p`.`ra` + `zz`.`deltaRa`)) and
         (`s`.`decl` between (`p`.`decl` - 0.000833) and (`p`.`decl` + 0.000833)) and
         (((pow((`p`.`x` - `s`.`x`),2) + pow((`p`.`y` - `s`.`y`),2)) + pow((`p`.`z` - `s`.`z`),2)) < 2.1137067679466e-10))

-----------------------------------------------------------------------------

2)
INSERT INTO LooseMatch
    SELECT p.id, s.id FROM InMemoryDIASource AS p
    INNER JOIN ZoneZone AS zz FORCE INDEX (PRIMARY) ON p.zoneId = zz.zoneId
    INNER JOIN InMemoryObject AS s FORCE INDEX (idx_zone_ra) ON
        zz.matchZoneId = s.zoneId AND
        s.ra BETWEEN p.ra - zz.deltaRa AND p.ra + zz.deltaRa
    WHERE s.decl BETWEEN p.decl - 0.000833 AND p.decl + 0.000833
    AND POW(p.x - s.x, 2) + POW(p.y - s.y, 2) + POW(p.z - s.z, 2) < 2.1137067679466e-10

EXPLAIN EXTENDED reports:
select `p`.`id` AS `id`,`s`.`id` AS `id`
  from `InMemoryDIASource` `p` join `ZoneZone` `zz` join `InMemoryObject` `s`
  where ((`zz`.`zoneId` = `p`.`zoneId`) and 
         (`zz`.`matchZoneId` = `s`.`zoneId`) and
         (`s`.`decl` between (`p`.`decl` - 0.000833) and (`p`.`decl` + 0.000833)) and
         (((pow((`p`.`x` - `s`.`x`),2) + pow((`p`.`y` - `s`.`y`),2)) + pow((`p`.`z` - `s`.`z`),2)) < 2.1137067679466e-10) and 
         (`s`.`ra` between (`p`.`ra` - `zz`.`deltaRa`) and (`p`.`ra` + `zz`.`deltaRa`)))

------------------------------------------------------------------------------

3)
Same as 1 but with STRAIGHT_JOIN:

INSERT INTO LooseMatch
    SELECT STRAIGHT_JOIN p.id, s.id FROM InMemoryDIASource AS p
    INNER JOIN ZoneZone AS zz FORCE INDEX (PRIMARY) ON p.zoneId = zz.zoneId
    INNER JOIN InMemoryObject AS s FORCE INDEX (idx_zone_ra) ON zz.matchZoneId = s.zoneId
    WHERE s.ra BETWEEN p.ra - zz.deltaRa AND p.ra + zz.deltaRa
    AND s.decl BETWEEN p.decl - 0.000833 AND p.decl + 0.000833
    AND POW(p.x - s.x, 2) + POW(p.y - s.y, 2) + POW(p.z - s.z, 2) < 2.1137067679466e-10

EXPLAIN EXTENDED reports

select straight_join `p`.`id` AS `id`,`s`.`id` AS `id` 
  from `InMemoryDIASource` `p` join `ZoneZone` `zz` join `InMemoryObject` `s` 
  where ((`zz`.`zoneId` = `p`.`zoneId`) and 
         (`s`.`zoneId` = `zz`.`matchZoneId`) and 
         (`s`.`ra` between (`p`.`ra` - `zz`.`deltaRa`) and (`p`.`ra` + `zz`.`deltaRa`)) and 
         (`s`.`decl` between (`p`.`decl` - 0.000833) and (`p`.`decl` + 0.000833)) and 
         (((pow((`p`.`x` - `s`.`x`),2) + pow((`p`.`y` - `s`.`y`),2)) + pow((`p`.`z` - `s`.`z`),2)) < 2.1137067679466e-10))

35 min execution time (better!)

------------------------------------------------------------------------------

4) same as 2 but with STRAIGHT_JOIN

select straight_join `p`.`id` AS `id`,`s`.`id` AS `id` 
  from `InMemoryDIASource` `p` join `ZoneZone` `zz` join `InMemoryObject` `s` 
  where ((`zz`.`zoneId` = `p`.`zoneId`) and 
         (`s`.`zoneId` = `zz`.`matchZoneId`) and 
         (`s`.`decl` between (`p`.`decl` - 0.000833) and (`p`.`decl` + 0.000833)) and 
         (((pow((`p`.`x` - `s`.`x`),2) + pow((`p`.`y` - `s`.`y`),2)) + pow((`p`.`z` - `s`.`z`),2)) < 2.1137067679466e-10) and
         (`s`.`ra` between (`p`.`ra` - `zz`.`deltaRa`) and (`p`.`ra` + `zz`.`deltaRa`)))

In all cases it looks like MySQL does not take full advantage of the composite (zoneId, ra) index on InMemoryObject: the "s.zoneId = zz.matchZoneId AND s.ra BETWEEN p.ra - zz.deltaRa AND p.ra + zz.deltaRa" should map to a range lookup, but it seems that only s.zoneId = zz.matchZoneId is used to look things up in the index.
[29 Jun 2007 0:05] Jacek Becla
schema related to bug #29433

Attachment: schema.sql (text/x-sql), 3.12 KiB.

[29 Jun 2007 0:05] Jacek Becla
Output from EXPLAIN realted to bug #29433

Attachment: explain.txt (text/plain), 5.54 KiB.

[3 Jul 2007 10:14] Sveta Smirnova
Thank you for the report.

Verified as described.
[28 Aug 2007 7:26] Sveta Smirnova
test case

Attachment: bug29433.test (application/octet-stream, text), 7.90 KiB.

[28 Aug 2007 7:27] Sveta Smirnova
Test case has attached.
[4 Sep 2007 15:17] Georgi Kodinov
Thank you for your report. Currently there are parts of the optimizer that are not completely cost based. Index access method is one of them (for some specific optimizations). We are aware of this deficiency but it will require a significant amount of changes to rectify so we will consider fixing this in a future release. We may as well consider adding optimizer hints for the index usage method.