Description:
The issue arises when we try to implement the cross matching algorithm from the following paper: MSR-TR-2006-52 "The Zones Algorithm for Finding Points-Near-a-Point or Cross-Matching Spatial Datasets", Gray, Jim; Nieto-Santisteban, Maria A.; Szalay, Alexander S.
The composite index is not used as expected. More details in the 'how to repeat' section.
How to repeat:
The schema files and the files with sample data can be downloaded from
http://www.slac.stanford.edu/~becla/mysql_bug/
(I don't want to dump 100+MB file into your ftp...)
Here are the variants of the core crossmatch query that were tried, along with explain extended output
1)
INSERT INTO LooseMatch
SELECT p.id, s.id FROM InMemoryDIASource AS p
INNER JOIN ZoneZone AS zz FORCE INDEX (PRIMARY) ON p.zoneId = zz.zoneId
INNER JOIN InMemoryObject AS s FORCE INDEX (idx_zone_ra) ON zz.matchZoneId = s.zoneId
WHERE s.ra BETWEEN p.ra - zz.deltaRa AND p.ra + zz.deltaRa
AND s.decl BETWEEN p.decl - 0.000833 AND p.decl + 0.000833
AND POW(p.x - s.x, 2) + POW(p.y - s.y, 2) + POW(p.z - s.z, 2) < 2.1137067679466e-10
40% done after 10 hours. EXPLAIN EXTENDED reports:
select `p`.`id` AS `id`,`s`.`id` AS `id`
from `InMemoryDIASource` `p` join `ZoneZone` `zz` join `InMemoryObject` `s`
where ((`zz`.`zoneId` = `p`.`zoneId`) and
(`zz`.`matchZoneId` = `s`.`zoneId`) and
(`s`.`ra` between (`p`.`ra` - `zz`.`deltaRa`) and (`p`.`ra` + `zz`.`deltaRa`)) and
(`s`.`decl` between (`p`.`decl` - 0.000833) and (`p`.`decl` + 0.000833)) and
(((pow((`p`.`x` - `s`.`x`),2) + pow((`p`.`y` - `s`.`y`),2)) + pow((`p`.`z` - `s`.`z`),2)) < 2.1137067679466e-10))
-----------------------------------------------------------------------------
2)
INSERT INTO LooseMatch
SELECT p.id, s.id FROM InMemoryDIASource AS p
INNER JOIN ZoneZone AS zz FORCE INDEX (PRIMARY) ON p.zoneId = zz.zoneId
INNER JOIN InMemoryObject AS s FORCE INDEX (idx_zone_ra) ON
zz.matchZoneId = s.zoneId AND
s.ra BETWEEN p.ra - zz.deltaRa AND p.ra + zz.deltaRa
WHERE s.decl BETWEEN p.decl - 0.000833 AND p.decl + 0.000833
AND POW(p.x - s.x, 2) + POW(p.y - s.y, 2) + POW(p.z - s.z, 2) < 2.1137067679466e-10
EXPLAIN EXTENDED reports:
select `p`.`id` AS `id`,`s`.`id` AS `id`
from `InMemoryDIASource` `p` join `ZoneZone` `zz` join `InMemoryObject` `s`
where ((`zz`.`zoneId` = `p`.`zoneId`) and
(`zz`.`matchZoneId` = `s`.`zoneId`) and
(`s`.`decl` between (`p`.`decl` - 0.000833) and (`p`.`decl` + 0.000833)) and
(((pow((`p`.`x` - `s`.`x`),2) + pow((`p`.`y` - `s`.`y`),2)) + pow((`p`.`z` - `s`.`z`),2)) < 2.1137067679466e-10) and
(`s`.`ra` between (`p`.`ra` - `zz`.`deltaRa`) and (`p`.`ra` + `zz`.`deltaRa`)))
------------------------------------------------------------------------------
3)
Same as 1 but with STRAIGHT_JOIN:
INSERT INTO LooseMatch
SELECT STRAIGHT_JOIN p.id, s.id FROM InMemoryDIASource AS p
INNER JOIN ZoneZone AS zz FORCE INDEX (PRIMARY) ON p.zoneId = zz.zoneId
INNER JOIN InMemoryObject AS s FORCE INDEX (idx_zone_ra) ON zz.matchZoneId = s.zoneId
WHERE s.ra BETWEEN p.ra - zz.deltaRa AND p.ra + zz.deltaRa
AND s.decl BETWEEN p.decl - 0.000833 AND p.decl + 0.000833
AND POW(p.x - s.x, 2) + POW(p.y - s.y, 2) + POW(p.z - s.z, 2) < 2.1137067679466e-10
EXPLAIN EXTENDED reports
select straight_join `p`.`id` AS `id`,`s`.`id` AS `id`
from `InMemoryDIASource` `p` join `ZoneZone` `zz` join `InMemoryObject` `s`
where ((`zz`.`zoneId` = `p`.`zoneId`) and
(`s`.`zoneId` = `zz`.`matchZoneId`) and
(`s`.`ra` between (`p`.`ra` - `zz`.`deltaRa`) and (`p`.`ra` + `zz`.`deltaRa`)) and
(`s`.`decl` between (`p`.`decl` - 0.000833) and (`p`.`decl` + 0.000833)) and
(((pow((`p`.`x` - `s`.`x`),2) + pow((`p`.`y` - `s`.`y`),2)) + pow((`p`.`z` - `s`.`z`),2)) < 2.1137067679466e-10))
35 min execution time (better!)
------------------------------------------------------------------------------
4) same as 2 but with STRAIGHT_JOIN
select straight_join `p`.`id` AS `id`,`s`.`id` AS `id`
from `InMemoryDIASource` `p` join `ZoneZone` `zz` join `InMemoryObject` `s`
where ((`zz`.`zoneId` = `p`.`zoneId`) and
(`s`.`zoneId` = `zz`.`matchZoneId`) and
(`s`.`decl` between (`p`.`decl` - 0.000833) and (`p`.`decl` + 0.000833)) and
(((pow((`p`.`x` - `s`.`x`),2) + pow((`p`.`y` - `s`.`y`),2)) + pow((`p`.`z` - `s`.`z`),2)) < 2.1137067679466e-10) and
(`s`.`ra` between (`p`.`ra` - `zz`.`deltaRa`) and (`p`.`ra` + `zz`.`deltaRa`)))
In all cases it looks like MySQL does not take full advantage of the composite (zoneId, ra) index on InMemoryObject: the "s.zoneId = zz.matchZoneId AND s.ra BETWEEN p.ra - zz.deltaRa AND p.ra + zz.deltaRa" should map to a range lookup, but it seems that only s.zoneId = zz.matchZoneId is used to look things up in the index.