Bug #94655 Some GIS function do not use spatial index anymore
Submitted: 14 Mar 11:04 Modified: 6 May 8:33
Reporter: Cedric Tabin Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[14 Mar 11:04] Cedric Tabin
Description:
With MySQL 8.0.15, when using the spatial index, MySQL takes 6-7x more time (~18s) to return the result set (17 rows) than a full range scan (~3s over 1,400,000 rows). In the version 5.7, the rows were returned less than one second by using the spatial index.

Note that both installations are standard, with the default variables (no tweak).

It might be related to https://bugs.mysql.com/bug.php?id=89551, but the performance drop doesn't seem to be related to the number of rows in the result set.

How to repeat:
Here the table I created:

CREATE TABLE `events` (
  `norecopk` char(64) NOT NULL,
  `eventstart` datetime DEFAULT NULL,
  `eventend` datetime DEFAULT NULL,
  `spatialdata` linestring NOT NULL /*!80003 SRID 0 */,
  PRIMARY KEY (`norecopk`),
  SPATIAL KEY `events_spatialdata` (`spatialdata`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The spatial data column is filled based on the eventstart/eventend content with the following query: LINESTRING(POINT(UNIX_TIMESTAMP(eventstart)*1000,1),POINT(UNIX_TIMESTAMP(eventend)*1000,-1));

After filling the data, here are the queries I tested (directly in the mysql client):

/* WITH the spatial index: ~18s */
SELECT p.norecopk FROM events p  WHERE MBRINTERSECTS(p.spatialdata, ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)'));

/* WITHOUT the spatial index, full range scan: ~3s */
SELECT p.norecopk FROM events p  WHERE MBRINTERSECTS(p.spatialdata, ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)')) = 1;
[14 Mar 12:17] Umesh Shastry
Hello Cedric Tabin,

Thank you for the report.
This looks like similar to Bug #89551. In order to reproduce issue at our end, may I request you to please provide subset of table data or full logical dump of the table to reproduce the issue at our end? If the data you need to attach is more than 3MB, you should create a compressed archive of the data and a README file that describes the data with a filename that includes the bug number (recommended filename: mysql-bug-data-89551.zip) and upload one to sftp.oracle.com. Thank you!

regards,
Umesh
[14 Mar 12:54] Cedric Tabin
Hello Umesh Shastry,

I've uploaded the file mysql-bug-data-94655.zip on sftp.oracle.com.
It contains one single table 'events' with 1,400,000 rows as describe in this bug report.
You can simply import it in a MySQL 8.0 server and run the queries above to reproduce the huge performance drop.

Hope it helps !

Best regards,
Cédric
[14 Mar 17:43] Sinisa Milivojevic
Hi,

First the EXPLAINs ...

For the first query, which uses index:

+----+-------------+-------+------------+-------+--------------------+--------------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys      | key                | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+--------------------+--------------------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | p     | NULL       | range | events_spatialdata | events_spatialdata | 34      | NULL |    1 |   100.00 | Using where |
+----+-------------+-------+------------+-------+--------------------+--------------------+---------+------+------+----------+-------------+
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                    |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `test`.`p`.`norecopk` AS `norecopk` from `test`.`events` `p` where mbrintersects(`test`.`p`.`spatialdata`,<cache>(st_geomfromtext('LINESTRING(1552435200000 1, 1552435200000 -1)'))) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

then, for the second query :

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | p     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1381644 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                          |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `test`.`p`.`norecopk` AS `norecopk` from `test`.`events` `p` where (mbrintersects(`test`.`p`.`spatialdata`,<cache>(st_geomfromtext('LINESTRING(1552435200000 1, 1552435200000 -1)'))) = 1) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Now, the times. For the first query, that uses index:

real	3m9.460s
user	0m0.012s
sys	0m0.029s

and for the second that scans all rows:

real	2m30.892s
user	0m0.013s
sys	0m0.022s

A difference is small. It depends on the size of the buffer pool and many other settings. Of course, I have preloaded the indices. Hope that you did not forget that !!! If I could have increased buffer pool, difference in speed would be smaller.

Also, the complexity of your WHERE clause should be taken into the account, which is why I am verifying this as a feature request.

Verified as a feature request.
[14 Mar 18:14] Cedric Tabin
Hi Sinisa,

Thanks for your analysis. I also did the EXPLAINs to ensure the queries were using the spatial index or not.

To match your execution test, I re-ran both of the queries on my MySQL 8.0 and here are the timings I get:

With the spatial index (ran 3 times in a row, the timings are almost always the same):
real    0m14.002s
user    0m0.004s
sys     0m0.008s

Without the spatial index (same here, ran 3 times in a row, the timings are almost always the same):
real    0m2.641s
user    0m0.008s
sys     0m0.004s

You can see the difference is quite big. Also, given your results, it seems really strange that the query with the spatial index is almost 40s slower than a full range scan. I would expect that the former would be much quicker (as in MySQL 5.7). At least shouldn't it be quicker than a full table scan ?

About preloading the indices, I don't know if you're speaking of the LOAD INDEX INTO CACHE statement... which I tried on the table and I get: The storage engine for the table doesn't support preload_keys. Could it be related to this ? I didn't see anything related to that in the migration steps I followed (see here: https://mysqlserverteam.com/upgrading-spatial-indexes-to-mysql-8-0/).

I also checked the innodb_buffer_pool_size variable in my settings and it is set to 134217728. It seems to be fairly high IMHO.

Considering the complexity of the query, this is the simplest one I could get as a testcase. As said, the query with the spatial index runs in milliseconds with MySQL 5.7 (on the same system) whereas it takes between 14s and 20s in MySQL 8.0 (I ensure the SRID 0 is forced on the columns and recreated the index).

That prevents us to upgrade to MySQL 8.0 because of this performance drop...

Thanks for your help !

Best regards,
Cédric
[15 Mar 10:24] Cedric Tabin
Hello again,

After a deeper search, it might be a problem with some GIS-related functions: some of them seem to use the spatial index, and other don't (even if the EXPLAIN told that it will).

Here is what I tested on the transmitted dataset (all of the queries use the spatial index, as told by the EXPLAIN):
- SELECT p.norecopk FROM events p  WHERE MBRCONTAINS(p.spatialdata, ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)'));  => 0m0.016s (ok)
- SELECT p.norecopk FROM events p  WHERE MBRWITHIN(ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)'), p.spatialdata);  => 0m0.019s (ok)
- SELECT p.norecopk FROM events p  WHERE MBROVERLAPS(p.spatialdata, ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)')); => 0m14.961s (KO)
- SELECT p.norecopk FROM events p  WHERE MBRINTERSECTS(p.spatialdata, ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)')) => 0m14.738s (KO)
- SELECT p.norecopk FROM events p  WHERE MBRTOUCHES(p.spatialdata, ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)')) => 0m14.869s (KO)

The other functions I tested (MBRCovers, MBRDisjoint, ...) explicitly indicate in the EXPLAIN that the spatial index won't be used (and so it fallback to a full range scan that takes ~2-3s).

FYI I changed the title of the bug and the severity.

Thanks in advance for any help.

Best regards,
Cédric
[15 Mar 10:57] Cedric Tabin
Optimizer trace for MBRIntersects (usage of spatial index = ok)

Attachment: sql-intersects.txt (text/plain), 6.77 KiB.

[15 Mar 10:58] Cedric Tabin
Optimizer trace for MBRWithin (usage of spatial index = ok)

Attachment: sql-within.txt (text/plain), 6.75 KiB.

[15 Mar 11:01] Cedric Tabin
Hello again,

I added the optimizer trace of both queries (MBRIntersects is KO actually, the label is wrong).
A diff shows very few differences, the costs are almost the same.
Hope it helps.

Best regards,
Cédric
[15 Mar 13:40] Sinisa Milivojevic
Hi,

Thank you for your contribution.

Changing title and severity in this bug will not change anything, since the bug report that will be worked upon is in the internal database, where changes in the public database are not reflected, unless I change them. I have decided to change the title, to reflect the new one, as it makes sense.

Your InnoDB buffer pool is too small for GIS queries on the tables of this size. If the index was some INT, it would be OK. I used 8 Gb for InnoDB buffer. I have also highly optimised all of my MySQL instances, just for this reason that poor or sub-optimal configurations do not influence bug verification.

How to optimise MySQL server, InnoDB storage engine and how to pre-load indices is fully described in our Reference Manual.
[15 Mar 14:25] Cedric Tabin
Hi Sinisa,

Thanks for your response.

Following your indications about the innodb_buffer_pool_size, I ran the MBRIntersects query with the following results:
- with 1G (1073741824) => 5.16s
- with 8G (8589934592) => 4.72s
- with 32G (34359738368) => 4.69s
- with 64G (68719476736) => 4.82s

This is still slower than a full range scan (less than 2s with a innodb buffer of 8G).
By increasing the buffer size, the query with a MBRContains is even faster (0.01s).

According to the documentation, I tried to change a bunch of different values (such as the key_buffer_size) with no significant improvement in the performances.  Since I use fresh installations (5.7 and 8.0), I wasn't expecting to experience a such significant performance drop with the default parameters.
Moreover, by reading https://dev.mysql.com/doc/refman/8.0/en/load-index.html, the LOAD INDEX command is only for MyISAM tables. Since I forced MySQL to read the records, the timings drops from 17s to 14s.

Since the MBRContains and MBRWithin functions are working fine on the same dataset, it looks really like a bug to me, especially as the MBRIntersects is slower with the index than a full range scan (which is a nonsense IMHO, the optimizer should avoid using it).

It seems that this performance drop has already been seen by other users:
- https://stackoverflow.com/questions/53819582/mysql-8-0-11-spatial-queries-slow-by-a-factor...
- https://dba.stackexchange.com/questions/214268/mysql-geo-spatial-query-is-very-slow-althou...

Thanks for your time and help.

Best regards,
Cédric
[15 Mar 14:28] Sinisa Milivojevic
Hi Cedric,

Thank you for your new input.

Those will be added to our internal database.

There are several issues that can be improved with it comes to optimising  GIS queries.

This is a reason why this bug is fully verified.
[15 Mar 14:41] Cedric Tabin
Hi Sinisa,

Thanks for your feedback.
Let me know if you need any other input.

Best regards,
Cédric
[6 May 8:33] Cedric Tabin
Hi,

For information, the same behavior is still present in the version 8.0.16 of MySQL.

Since the same queries were working fine on the same dataset in MySQL 5.7, shouldn't this bug be considered as a regression and have a higher priority than "Feature request" ? In our case, this simply prevents us to migrate to the latest version, even if we do not use heavily spatial indexes, because it causes huge performance problems.

Best regards,
Cédric
[6 May 13:13] Sinisa Milivojevic
Hi,

This is a regression in performance only. Regressions bugs are high severity bugs that occur due to the fix of some third bug. 

I can change  this into "Performance" bug, but I have intentionally labeled it as a feature request, since it has a higher severity.