MySQL Bugs: #24754: Index is not used when joining two tables

Bug #24754	Index is not used when joining two tables
Submitted:	1 Dec 2006 13:34	Modified:	10 Dec 2009 15:24
Reporter:	Alexander Barkov	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.1.14, 5.1.43-bzr	OS:	Linux (Linux)
Assigned to:		CPU Architecture:	Any
Tags:	Q1

Description:
Spatial index is not used for optimization when
joining two tables.

How to repeat:
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
CREATE TABLE t1 (g geometry not null, spatial index(g));
CREATE TABLE t2 (g geometry not null, spatial index(g));
INSERT INTO t1 VALUES (GeomFromText('POINT(1 2)'));
INSERT INTO t1 VALUES (GeomFromText('POINT(1 1)'));
INSERT INTO t1 VALUES (GeomFromText('POINT(1 2)'));
INSERT INTO t1 VALUES (GeomFromText('POINT(1 3)'));
INSERT INTO t1 VALUES (GeomFromText('POINT(1 4)'));
INSERT INTO t1 VALUES (GeomFromText('POINT(1 5)'));
INSERT INTO t1 VALUES (GeomFromText('POINT(1 6)'));
INSERT INTO t2 VALUES (GeomFromText('POINT(1 2)'));
INSERT INTO t2 VALUES (GeomFromText('POINT(1 1)'));
INSERT INTO t2 VALUES (GeomFromText('POINT(1 2)'));
INSERT INTO t2 VALUES (GeomFromText('POINT(1 3)'));
INSERT INTO t2 VALUES (GeomFromText('POINT(1 4)'));
INSERT INTO t2 VALUES (GeomFromText('POINT(1 5)'));
INSERT INTO t2 VALUES (GeomFromText('POINT(1 6)'));

mysql> explain SELECT AsText(t1.g), AsText(t2.g) FROM t1, t2 WHERE Intersects(t1.g, t2.g);
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | t1    | ALL  | g             | NULL | NULL    | NULL |    7 |             |
|  1 | SIMPLE      | t2    | ALL  | g             | NULL | NULL    | NULL |    7 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
2 rows in set (0.01 sec)

mysql> explain SELECT AsText(t1.g), AsText(t2.g) FROM t1, t2 WHERE Intersects(t1.g, t2.g);                                          

Suggested fix:
Fix to optimize joins using spatial indexes.

Thank you for a problem report. Verified just as described with latest 5.1.14-BK on Linux:

openxs@suse:~/dbs/5.1> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.14-beta Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> DROP TABLE IF EXISTS t1;
;
CREATEQuery OK, 0 rows affected (0.01 sec)

 mysql> DROP TABLE IF EXISTS t2;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE t1 (g geometry not null, spatial index(g));
CRQuery OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE t2 (g geometry not null, spatial index(g));
RT INTO t1 VALUES Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO t1 VALUES (GeomFromText('POINT(1 2)'));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t1 VALUES (GeomFromText('POINT(1 1)'));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t1 VALUES (GeomFromText('POINT(1 2)'));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t1 VALUES (GeomFromText('POINT(1 3)'));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t1 VALUES (GeomFromText('POINT(1 4)'));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t1 VALUES (GeomFromText('POINT(1 5)'));
Query OK, 1 row affected (0.00 sec)

Imysql> INSERT INTO t1 VALUES (GeomFromText('POINT(1 6)'));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t2 VALUES (GeomFromText('POINT(1 2)'));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t2 VALUES (GeomFromText('POINT(1 1)'));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t2 VALUES (GeomFromText('POINT(1 2)'));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t2 VALUES (GeomFromText('POINT(1 3)'));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t2 VALUES (GeomFromText('POINT(1 4)'));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t2 VALUES (GeomFromText('POINT(1 5)'));
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t2 VALUES (GeomFromText('POINT(1 6)'));
Query OK, 1 row affected (0.00 sec)

mysql> explain SELECT AsText(t1.g), AsText(t2.g) FROM t1, t2 WHERE
    -> Intersects(t1.g, t2.g)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ALL
possible_keys: g
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 7
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ALL
possible_keys: g
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 7
        Extra: Using where
2 rows in set (0.00 sec)

Alexander,
can you please be more specific on what is the intended result/usage here ?

If we permute the tables in the FROM clause we get :

explain SELECT AsText(t1.g), AsText(t2.g) FROM t2, t1 WHERE Intersects(t1.g, t2.g)
--------------

+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                          |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+
|  1 | SIMPLE      | t2    | ALL  | g             | NULL | NULL    | NULL |    7 |                                                | 
|  1 | SIMPLE      | t1    | ALL  | g             | NULL | NULL    | NULL |    7 | Range checked for each record (index map: 0x1) | 
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+

This shows that the index actually gets used on t1.
The decision to use range check with the values of the columns from the previous table is taken only after the join order is determined, so it can't currently influence the join order selection. 
Besides it will work only if the column on the left is from the table it can be applied to. It doesn't try to reverse the predicate (where possible) and check again.

This is not anyhow specific to the spatial indexes: 

create table m1 (a int, key (a));
create table m2 (a int, key (a));
insert into m1 values (1),(2),(3),(4);
insert into m2 values (1),(2),(3),(4);

explain select 1 from m1, m2 where m1.a <= m2.a
--------------

+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | m1    | index | a             | a    | 5       | NULL |    4 | Using index              | 
|  1 | SIMPLE      | m2    | index | a             | a    | 5       | NULL |    4 | Using where; Using index | 
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+

explain select 1 from m2, m1 where m1.a <= m2.a
--------------

+----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                          |
+----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------------------+
|  1 | SIMPLE      | m2    | index | a             | a    | 5       | NULL |    4 | Using index                                    | 
|  1 | SIMPLE      | m1    | ALL   | a             | NULL | NULL    | NULL |    4 | Range checked for each record (index map: 0x1) | 
+----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------------------------+

Making it work with reversible predicates regardless of the join order by either reversing the condition or adding a reversed condition is a valid optimization in itself.

The bigger issue here is that the "Range checked for each record" optimization does not affect the join order selection. This currently is impossible to improve because the optimizer cannot estimate the selectivity of such a predicate in any reasonably fast manner (because of the lack of data statistics).

This issue will be addressed in a future release. Until then please use the workaround provided in my previous comment (i.e. specify the correct join order).