Bug #68899 High join_buffer_size prevents use of spatial index on joined tables
Submitted: 9 Apr 2013 9:38 Modified: 10 Apr 2013 10:58
Reporter: Dario Napolitano Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.5.30, 5.6.10 OS:Linux
Assigned to: CPU Architecture:Any

[9 Apr 2013 9:38] Dario Napolitano
Description:
A SELECT query with a JOIN between two tables with a spatial index does not use the index if the server is configured with a high value for join_buffer_size.

How to repeat:
Table A:
CREATE TABLE `ZIP_GEO` (
  `ZIP` varchar(5) NOT NULL,
  `LOCATION` point NOT NULL,
  PRIMARY KEY (`ZIP`),
  SPATIAL KEY `IDX_ZIP_GEO` (`LOCATION`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

Table B:
CREATE TABLE `SHOPS_GEO` (
  `ID` int(11) NOT NULL,
  `LOCATION` point NOT NULL,
  PRIMARY KEY (`ID`),
  SPATIAL KEY `IDX_SHOP_GEO` (`LOCATION`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

Support procedure (creates a square polygon around a given point): 
CREATE FUNCTION `RangeBox`(p POINT, rng FLOAT) RETURNS polygon
    NO SQL
    DETERMINISTIC
BEGIN
    SET @x := X(p);
    SET @y := Y(p);
    SET @v1 := Point(@x-rng,@y-rng);
    SET @v2 := Point(@x-rng,@y+rng);
    SET @v3 := Point(@x+rng,@y+rng);
    SET @v4 := Point(@x+rng,@y-rng);
    SET @ls := LineString(GeomFromText(AsText(@v1)),GeomFromText(AsText(@v2)),GeomFromText(AsText(@v3)),GeomFromText(AsText(@v4)),GeomFromText(AsText(@v1)));
    SET @poly := Polygon(GeomFromText(AsText(@ls)));
    RETURN @poly;
END

Test query:
SELECT DISTINCT(ZIP)
    from ZIP_GEO zip force index(IDX_ZIP_GEO)
    JOIN SHOPS_GEO sg  force index(IDX_SHOP_GEO)
    ON (MBRContains(RangeBox(sg.location, 0.01904761977493763), zip.location)) 

An EXPLAIN on the given query returns "Range checked for each record (index map: 0x2)" with the default configuration and it will take only a few seconds. However with a value of join_buffer_size=8M it will show "Using where; Distinct; Using join buffer" and will take a very long time to execute.
[10 Apr 2013 6:37] MySQL Verification Team
Hello Dario,

Thank you for the report.
I cannot repeat reported behavior with the dummy data, could you please provide the test data to reproduce at our end?

Thanks,
Umesh
[10 Apr 2013 10:58] MySQL Verification Team
Hello Dario,

Thank you for the report.
Verified as described on reported and later versions.

Thanks,
Umesh
[10 Apr 2013 10:59] MySQL Verification Team
Test cases: Table creation scripts with data insert statements.
Attachment: 68899-sqldata.zip (application/zip, text), 361.50 KiB.

| 5.5.30    |

mysql> show global variables like 'query_%';
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| query_alloc_block_size       | 8192    |
| query_cache_limit            | 1048576 |
| query_cache_min_res_unit     | 4096    |
| query_cache_size             | 0       |
| query_cache_type             | ON      |
| query_cache_wlock_invalidate | OFF     |
| query_prealloc_size          | 8192    |
+------------------------------+---------+

## Default join buffer size

mysql> show global variables like 'join_buffer_size';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| join_buffer_size | 131072 |
+------------------+--------+

mysql>
mysql>
mysql> explain SELECT DISTINCT(ZIP)
    ->     from ZIP_GEO zip force index(IDX_ZIP_GEO)
    ->     JOIN SHOPS_GEO sg  force index(IDX_SHOP_GEO)
    ->     ON (MBRContains(RangeBox(sg.location, 0.01904761977493763), zip.location));
+----+-------------+-------+------+---------------+------+---------+------+-------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                                          |
+----+-------------+-------+------+---------------+------+---------+------+-------+------------------------------------------------+
|  1 | SIMPLE      | sg    | ALL  | NULL          | NULL | NULL    | NULL | 28879 | Using temporary                                |
|  1 | SIMPLE      | zip   | ALL  | IDX_ZIP_GEO   | NULL | NULL    | NULL |  4508 | Range checked for each record (index map: 0x2) |
+----+-------------+-------+------+---------------+------+---------+------+-------+------------------------------------------------+
2 rows in set (0.00 sec)

mysql> SELECT DISTINCT(ZIP) from ZIP_GEO zip force index(IDX_ZIP_GEO)     JOIN SHOPS_GEO sg  force index(IDX_SHOP_GEO)     ON (MBRContains(RangeBox(sg.location, 0.01904761977493763), zip.location));
..
..
| 91018 |
+-------+
1877 rows in set (3.72 sec)

mysql> show global variables like 'join_buffer_size';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| join_buffer_size | 131072 |
+------------------+--------+
1 row in set (0.00 sec)

## Change join buffer size to 8M

mysql> set global join_buffer_size=8*1024*1024;
Query OK, 0 rows affected (0.00 sec)

mysql> show global variables like 'join_buffer_size';
+------------------+---------+
| Variable_name    | Value   |
+------------------+---------+
| join_buffer_size | 8388608 |
+------------------+---------+
1 row in set (0.00 sec)

## explain plan

mysql> explain SELECT DISTINCT(ZIP)     from ZIP_GEO zip force index(IDX_ZIP_GEO)     JOIN SHOPS_GEO sg  force index(IDX_SHOP_GEO)     ON (MBRContains(RangeBox(sg.location, 0.01904761977493763), zip.location));
+----+-------------+-------+------+---------------+------+---------+------+-------+------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                                    |
+----+-------------+-------+------+---------------+------+---------+------+-------+------------------------------------------+
|  1 | SIMPLE      | zip   | ALL  | IDX_ZIP_GEO   | NULL | NULL    | NULL |  4508 | Using temporary                          |
|  1 | SIMPLE      | sg    | ALL  | NULL          | NULL | NULL    | NULL | 28879 | Using where; Distinct; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+-------+------------------------------------------+

###
mysql> explain extended SELECT DISTINCT(ZIP)     from ZIP_GEO zip force index(IDX_ZIP_GEO)     JOIN SHOPS_GEO sg  force index(IDX_SHOP_GEO)     ON (MBRContains(RangeBox(sg.location, 0.01904761977493763), zip.location));
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                                    |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+------------------------------------------+
|  1 | SIMPLE      | zip   | ALL  | IDX_ZIP_GEO   | NULL | NULL    | NULL |  4508 |   100.00 | Using temporary                          |
|  1 | SIMPLE      | sg    | ALL  | NULL          | NULL | NULL    | NULL | 28879 |   100.00 | Using where; Distinct; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql>
mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                      |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select distinct `test`.`zip`.`ZIP` AS `ZIP` from `test`.`ZIP_GEO` `zip` FORCE INDEX (`IDX_ZIP_GEO`) join `test`.`SHOPS_GEO` `sg` FORCE INDEX (`IDX_SHOP_GEO`) where contains(`RangeBox`(`test`.`sg`.`LOCATION`,0.01904761977493763),`test`.`zip`.`LOCATION`) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

mysql> SELECT DISTINCT(ZIP) from ZIP_GEO zip force index(IDX_ZIP_GEO)     JOIN SHOPS_GEO sg  force index(IDX_SHOP_GEO)     ON (MBRContains(RangeBox(sg.location, 0.01904761977493763), zip.location));
..
...

## Taking lot of time and still running
## query running from long after setting join_buffer_size to 8*1024*1024..

mysql> show processlist;
+----+------+-----------+------+---------+------+-------+-----------------------------------------------------------------------------------+
| Id | User | Host      | db   | Command | Time | State | Info                                                                              |
+----+------+-----------+------+---------+------+-------+-----------------------------------------------------------------------------------+
|  5 | root | localhost | test | Query   | 1632 | NULL  | SET @v2 := Point NAME_CONST('rng',0.0190476)-rng NAME_CONST('rng',0.0190476)+rng) |
|  6 | root | localhost | test | Query   |    0 | NULL  | show processlist                                                                  |
+----+------+-----------+------+---------+------+-------+-----------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
[10 Apr 2013 10:59] MySQL Verification Team
mysql> show global variables like 'query_cache%';
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| query_cache_limit            | 1048576 |
| query_cache_min_res_unit     | 4096    |
| query_cache_size             | 1048576 |
| query_cache_type             | OFF     |
| query_cache_wlock_invalidate | OFF     |
+------------------------------+---------+

## Default join buffer size

mysql> show global variables like 'join_buffer_size';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| join_buffer_size | 262144 |
+------------------+--------+
1 row in set (0.00 sec)

mysql> 

mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.6.10-debug |
+--------------+
1 row in set (0.00 sec)

mysql> SELECT DISTINCT(ZIP)     from ZIP_GEO zip force index(IDX_ZIP_GEO)     JOIN SHOPS_GEO sg  force index(IDX_SHOP_GEO)     ON (MBRContains(RangeBox(sg.location, 0.01904761977493763), zip.location));
+-------+
| ZIP   |
+-------+
| 20089 |
..
..
+-------+
1877 rows in set (20.23 sec)

mysql> explain SELECT DISTINCT(ZIP)     from ZIP_GEO zip force index(IDX_ZIP_GEO)     JOIN SHOPS_GEO sg  force index(IDX_SHOP_GEO)     ON (MBRContains(RangeBox(sg.location, 0.01904761977493763), zip.location));
+----+-------------+-------+------+---------------+------+---------+------+-------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                                          |
+----+-------------+-------+------+---------------+------+---------+------+-------+------------------------------------------------+
|  1 | SIMPLE      | sg    | ALL  | NULL          | NULL | NULL    | NULL | 28879 | Using temporary                                |
|  1 | SIMPLE      | zip   | ALL  | IDX_ZIP_GEO   | NULL | NULL    | NULL |  4508 | Range checked for each record (index map: 0x2) |
+----+-------------+-------+------+---------------+------+---------+------+-------+------------------------------------------------+
2 rows in set (0.00 sec)

## Set join_buffer_size to 8*1024*1024

mysql> set global join_buffer_size=8*1024*1024;
Query OK, 0 rows affected (0.00 sec)

mysql> show global variables like 'join_buffer_size';
+------------------+---------+
| Variable_name    | Value   |
+------------------+---------+
| join_buffer_size | 8388608 |
+------------------+---------+
1 row in set (0.00 sec)

## 

mysql> explain SELECT DISTINCT(ZIP)     from ZIP_GEO zip force index(IDX_ZIP_GEO)     JOIN SHOPS_GEO sg  force index(IDX_SHOP_GEO)     ON (MBRContains(RangeBox(sg.location, 0.01904761977493763), zip.location));
+----+-------------+-------+------+---------------+------+---------+------+-------+--------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                                                        |
+----+-------------+-------+------+---------------+------+---------+------+-------+--------------------------------------------------------------+
|  1 | SIMPLE      | zip   | ALL  | IDX_ZIP_GEO   | NULL | NULL    | NULL |  4508 | Using temporary                                              |
|  1 | SIMPLE      | sg    | ALL  | NULL          | NULL | NULL    | NULL | 28879 | Using where; Distinct; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+-------+--------------------------------------------------------------+

mysql> explain extended SELECT DISTINCT(ZIP)     from ZIP_GEO zip force index(IDX_ZIP_GEO)     JOIN SHOPS_GEO sg  force index(IDX_SHOP_GEO)     ON (MBRContains(RangeBox(sg.location, 0.01904761977493763), zip.location));
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+--------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                                                        |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+--------------------------------------------------------------+
|  1 | SIMPLE      | zip   | ALL  | IDX_ZIP_GEO   | NULL | NULL    | NULL |  4508 |   100.00 | Using temporary                                              |
|  1 | SIMPLE      | sg    | ALL  | NULL          | NULL | NULL    | NULL | 28879 |   100.00 | Using where; Distinct; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+--------------------------------------------------------------+
2 rows in set, 1 warning (0.01 sec)

mysql> show warnings;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                        |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select distinct `test`.`zip`.`ZIP` AS `ZIP` from `test`.`ZIP_GEO` `zip` FORCE INDEX (`IDX_ZIP_GEO`) join `test`.`SHOPS_GEO` `sg` FORCE INDEX (`IDX_SHOP_GEO`) where mbrcontains(`RangeBox`(`test`.`sg`.`LOCATION`,0.01904761977493763),`test`.`zip`.`LOCATION`) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

### 

mysql> SELECT DISTINCT(ZIP) from ZIP_GEO zip force index(IDX_ZIP_GEO)     JOIN SHOPS_GEO sg  force index(IDX_SHOP_GEO)     ON (MBRContains(RangeBox(sg.location, 0.01904761977493763), zip.location));
..
..

## query running from long after setting join_buffer_size to 8*1024*1024

mysql> show processlist;
+----+------+-----------------+------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host            | db   | Command | Time | State          | Info                                                                                                 |
+----+------+-----------------+------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
|  8 | root | localhost:43704 | test | Query   | 5350 | Opening tables | SET @ls := LineString(GeomFromText(AsText(@v1)),GeomFromText(AsText(@v2)),GeomFromText(AsText(@v3)), |
| 10 | root | localhost:38626 | test | Query   |    0 | init           | show processlist                                                                                     |
+----+------+-----------------+------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
[12 Apr 2013 11:48] Jørgen Løland
Hi Dario,

The "Range hecked for each record..." access method is called dynamic range. It creates a new range predicate for each row constructed by tables earlier in the join sequence by taking one or more column values from these tables as input. 

For 'range access' to be applicable for a column 'x', we need a predicate of the form "'x' OP <something>". For your query, the predicate qualifies for range access for column zip.location (rewritten to something like "zip.location is_within function(sg.location)"), but not for column sg.location because it is used as input to a function instead of compared to something. 

Thus, dynamic range can be used if table 'sg' is read before 'zip', but not if 'zip' is read before 'sg'. If you take another look at the EXPLAIN output, you'll see that the join order is reversed in the join_buffer_size=8M case. The reason is that the cost of joining changes as the jbs changes. In other words, you've just reached a point where 'zip','sg' is considered a cheaper join order than 'sg','zip'. 

Unfortunately, the speed-up of applying dynamic range is not taken into account in the MySQL join order optimizer. Changing this is tricky, but we're aware of the problem. For now, I suggest you use STRAIGHT_JOIN to get the best plan for your query.

Regards,
Jørgen