Bug #50834 MBRIntersects uses SPATIAL index only for the first argument
Submitted: 2 Feb 2010 14:03 Modified: 13 May 2015 8:46
Reporter: Alex Bolenok Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.1.35-community, 5.1.44-bzr OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: gis, INDEX, spatial

[2 Feb 2010 14:03] Alex Bolenok
Description:
When optimizing queries containing a JOIN on MBRIntersects, the optimizer treats MBRIntersects as a symmetrical function, however, the index can only be used on the first column.

The optimizer can swap the join order so that the table containing least number of records is leading in the join.

Normally, this would make sense, since the overall query time would be (n * log (m)), where n is the number of records in the leading table and log (m) is the time required to search an index in the driven table.

However, MBRIntersects is implemented so that the SPATIAL index can only be used against the first argument.

When swapping the join order, the order of arguments to MBRIntersects is not swapped and the index is not used, despite the fact that the function itself is symmetrical and MBRIntersects(a, b) = MBRIntersects(b, a) for any a, b.

How to repeat:

SELECT  VERSION();
USE test;
DROP DATABASE bugs;
CREATE DATABASE bugs;
USE bugs;

CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

CREATE TABLE t_big (
        id INT NOT NULL PRIMARY KEY,
        rg LineString NOT NULL,
        SPATIAL KEY sx_big_rg (rg)
) ENGINE=MyISAM;

CREATE TABLE t_small (
        id INT NOT NULL PRIMARY KEY,
        rg LineString NOT NULL,
        SPATIAL KEY sx_small_rg (rg)
) ENGINE=MyISAM;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt <= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

START TRANSACTION;
CALL prc_filler(24000);
COMMIT;

INSERT
INTO    t_big (id, rg)
SELECT  id,
        LineString(
        Point(-1, TIMESTAMPDIFF(second, '1970-01-01', range_start)),
        Point(1, TIMESTAMPDIFF(second, '1970-01-01', range_end))
        )
FROM    (
        SELECT  id,
                STR_TO_DATE('2009-02-01', '%Y-%m-%d') - INTERVAL id HOUR AS range_start,
                STR_TO_DATE('2009-02-01', '%Y-%m-%d') - INTERVAL id HOUR + INTERVAL RAND(20100201) * 7200 SECOND AS range_end
        FROM    filler
        ) q;

INSERT
INTO    t_small (id, rg)
SELECT  id,
        LineString(
        Point(-1, TIMESTAMPDIFF(second, '1970-01-01', range_start)),
        Point(1, TIMESTAMPDIFF(second, '1970-01-01', range_end))
        )
FROM    (
        SELECT  id,
                STR_TO_DATE('2009-02-01', '%Y-%m-%d') - INTERVAL id DAY AS range_start,
                STR_TO_DATE('2009-02-01', '%Y-%m-%d') - INTERVAL id DAY + INTERVAL RAND(20100201) * 172800 SECOND AS range_end
        FROM    filler
        ORDER BY
                id
        LIMIT 1000
        ) q;

EXPLAIN
SELECT  COUNT(*)
FROM    t_big b
JOIN    t_small s
ON      MBRIntersects(s.rg, b.rg);

1, 'SIMPLE', 's', 'ALL', 'sx_small_rg', '', '', '', 1000, ''
1, 'SIMPLE', 'b', 'ALL', 'sx_big_rg', '', '', '', 24000, 'Using where; Using join buffer'

EXPLAIN
SELECT  COUNT(*)
FROM    t_big b
JOIN    t_small s
ON      MBRIntersects(b.rg, s.rg);

1, 'SIMPLE', 's', 'ALL', 'sx_small_rg', '', '', '', 1000, ''
1, 'SIMPLE', 'b', 'ALL', 'sx_big_rg', '', '', '', 24000, 'Range checked for each record (index map: 0x2)'

SELECT  COUNT(*)
FROM    t_big b
JOIN    t_small s
ON      MBRIntersects(s.rg, b.rg);

SELECT  COUNT(*)
FROM    t_big b
JOIN    t_small s
ON      MBRIntersects(b.rg, s.rg);

The first query uses "join buffer" and runs for 9 seconds.

The second query uses SPATIAL index and runs for 0.25 seconds.

I expected both queries to have the same execution plan and the same performance.

Suggested fix:
Fix the optimizer algorithm so that both arguments for MBRIntersects are sargable (usable by a SPATIAL index).
[2 Feb 2010 18:12] Valeriy Kravchuk
Thank you for the bug report. Verified just as described with recent 5.1.44 from bzr:

...
mysql> EXPLAIN
    -> SELECT  COUNT(*)
    -> FROM    t_big b
    -> JOIN    t_small s
    -> ON      MBRIntersects(s.rg, b.rg);
+----+-------------+-------+------+---------------+------+---------+------+-------+--------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                          |
+----+-------------+-------+------+---------------+------+---------+------+-------+--------------------------------+
|  1 | SIMPLE      | s     | ALL  | sx_small_rg   | NULL | NULL    | NULL |  1000 |                                |
|  1 | SIMPLE      | b     | ALL  | sx_big_rg     | NULL | NULL    | NULL | 24000 | Using where; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+-------+--------------------------------+
2 rows in set (0.00 sec)

mysql> EXPLAIN
    -> SELECT  COUNT(*)
    -> FROM    t_big b
    -> JOIN    t_small s
    -> ON      MBRIntersects(b.rg, s.rg);
+----+-------------+-------+------+---------------+------+---------+------+-------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                                          |
+----+-------------+-------+------+---------------+------+---------+------+-------+------------------------------------------------+
|  1 | SIMPLE      | s     | ALL  | sx_small_rg   | NULL | NULL    | NULL |  1000 |                                                |
|  1 | SIMPLE      | b     | ALL  | sx_big_rg     | NULL | NULL    | NULL | 24000 | Range checked for each record (index map: 0x2) |
+----+-------------+-------+------+---------------+------+---------+------+-------+------------------------------------------------+
2 rows in set (0.00 sec)

mysql> SELECT  COUNT(*)
    -> FROM    t_big b
    -> JOIN    t_small s
    -> ON      MBRIntersects(s.rg, b.rg);
+----------+
| COUNT(*) |
+----------+
|    24694 |
+----------+
1 row in set (14.76 sec)

mysql> SELECT  COUNT(*) FROM    t_big b JOIN    t_small s ON      MBRIntersects(s.rg, b.rg);
+----------+
| COUNT(*) |
+----------+
|    24694 |
+----------+
1 row in set (16.10 sec)

mysql> SELECT  COUNT(*)
    -> FROM    t_big b
    -> JOIN    t_small s
    -> ON      MBRIntersects(b.rg, s.rg);
+----------+
| COUNT(*) |
+----------+
|    24694 |
+----------+
1 row in set (0.70 sec)

mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.1.44-debug |
+--------------+
1 row in set (0.00 sec)
[13 May 2015 8:46] Norvald Ryeng
Posted by developer:
 
In 5.7, the optimizer chooses the same plan and is able to use the spatial index for both queries.