Bug #94655 Some GIS function do not use spatial index anymore
Submitted: 14 Mar 2019 11:04 Modified: 7 Mar 2020 14:35
Reporter: Cedric Tabin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[14 Mar 2019 11:04] Cedric Tabin
Description:
With MySQL 8.0.15, when using the spatial index, MySQL takes 6-7x more time (~18s) to return the result set (17 rows) than a full range scan (~3s over 1,400,000 rows). In the version 5.7, the rows were returned less than one second by using the spatial index.

Note that both installations are standard, with the default variables (no tweak).

It might be related to https://bugs.mysql.com/bug.php?id=89551, but the performance drop doesn't seem to be related to the number of rows in the result set.

How to repeat:
Here the table I created:

CREATE TABLE `events` (
  `norecopk` char(64) NOT NULL,
  `eventstart` datetime DEFAULT NULL,
  `eventend` datetime DEFAULT NULL,
  `spatialdata` linestring NOT NULL /*!80003 SRID 0 */,
  PRIMARY KEY (`norecopk`),
  SPATIAL KEY `events_spatialdata` (`spatialdata`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The spatial data column is filled based on the eventstart/eventend content with the following query: LINESTRING(POINT(UNIX_TIMESTAMP(eventstart)*1000,1),POINT(UNIX_TIMESTAMP(eventend)*1000,-1));

After filling the data, here are the queries I tested (directly in the mysql client):

/* WITH the spatial index: ~18s */
SELECT p.norecopk FROM events p  WHERE MBRINTERSECTS(p.spatialdata, ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)'));

/* WITHOUT the spatial index, full range scan: ~3s */
SELECT p.norecopk FROM events p  WHERE MBRINTERSECTS(p.spatialdata, ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)')) = 1;
[14 Mar 2019 12:17] MySQL Verification Team
Hello Cedric Tabin,

Thank you for the report.
This looks like similar to Bug #89551. In order to reproduce issue at our end, may I request you to please provide subset of table data or full logical dump of the table to reproduce the issue at our end? If the data you need to attach is more than 3MB, you should create a compressed archive of the data and a README file that describes the data with a filename that includes the bug number (recommended filename: mysql-bug-data-89551.zip) and upload one to sftp.oracle.com. Thank you!

regards,
Umesh
[14 Mar 2019 12:54] Cedric Tabin
Hello Umesh Shastry,

I've uploaded the file mysql-bug-data-94655.zip on sftp.oracle.com.
It contains one single table 'events' with 1,400,000 rows as describe in this bug report.
You can simply import it in a MySQL 8.0 server and run the queries above to reproduce the huge performance drop.

Hope it helps !

Best regards,
Cédric
[14 Mar 2019 17:43] MySQL Verification Team
Hi,

First the EXPLAINs ...

For the first query, which uses index:

+----+-------------+-------+------------+-------+--------------------+--------------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys      | key                | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+--------------------+--------------------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | p     | NULL       | range | events_spatialdata | events_spatialdata | 34      | NULL |    1 |   100.00 | Using where |
+----+-------------+-------+------------+-------+--------------------+--------------------+---------+------+------+----------+-------------+
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                    |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `test`.`p`.`norecopk` AS `norecopk` from `test`.`events` `p` where mbrintersects(`test`.`p`.`spatialdata`,<cache>(st_geomfromtext('LINESTRING(1552435200000 1, 1552435200000 -1)'))) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

then, for the second query :

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | p     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1381644 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                          |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `test`.`p`.`norecopk` AS `norecopk` from `test`.`events` `p` where (mbrintersects(`test`.`p`.`spatialdata`,<cache>(st_geomfromtext('LINESTRING(1552435200000 1, 1552435200000 -1)'))) = 1) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Now, the times. For the first query, that uses index:

real	3m9.460s
user	0m0.012s
sys	0m0.029s

and for the second that scans all rows:

real	2m30.892s
user	0m0.013s
sys	0m0.022s

A difference is small. It depends on the size of the buffer pool and many other settings. Of course, I have preloaded the indices. Hope that you did not forget that !!! If I could have increased buffer pool, difference in speed would be smaller.

Also, the complexity of your WHERE clause should be taken into the account, which is why I am verifying this as a feature request.

Verified as a feature request.
[14 Mar 2019 18:14] Cedric Tabin
Hi Sinisa,

Thanks for your analysis. I also did the EXPLAINs to ensure the queries were using the spatial index or not.

To match your execution test, I re-ran both of the queries on my MySQL 8.0 and here are the timings I get:

With the spatial index (ran 3 times in a row, the timings are almost always the same):
real    0m14.002s
user    0m0.004s
sys     0m0.008s

Without the spatial index (same here, ran 3 times in a row, the timings are almost always the same):
real    0m2.641s
user    0m0.008s
sys     0m0.004s

You can see the difference is quite big. Also, given your results, it seems really strange that the query with the spatial index is almost 40s slower than a full range scan. I would expect that the former would be much quicker (as in MySQL 5.7). At least shouldn't it be quicker than a full table scan ?

About preloading the indices, I don't know if you're speaking of the LOAD INDEX INTO CACHE statement... which I tried on the table and I get: The storage engine for the table doesn't support preload_keys. Could it be related to this ? I didn't see anything related to that in the migration steps I followed (see here: https://mysqlserverteam.com/upgrading-spatial-indexes-to-mysql-8-0/).

I also checked the innodb_buffer_pool_size variable in my settings and it is set to 134217728. It seems to be fairly high IMHO.

Considering the complexity of the query, this is the simplest one I could get as a testcase. As said, the query with the spatial index runs in milliseconds with MySQL 5.7 (on the same system) whereas it takes between 14s and 20s in MySQL 8.0 (I ensure the SRID 0 is forced on the columns and recreated the index).

That prevents us to upgrade to MySQL 8.0 because of this performance drop...

Thanks for your help !

Best regards,
Cédric
[15 Mar 2019 10:24] Cedric Tabin
Hello again,

After a deeper search, it might be a problem with some GIS-related functions: some of them seem to use the spatial index, and other don't (even if the EXPLAIN told that it will).

Here is what I tested on the transmitted dataset (all of the queries use the spatial index, as told by the EXPLAIN):
- SELECT p.norecopk FROM events p  WHERE MBRCONTAINS(p.spatialdata, ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)'));  => 0m0.016s (ok)
- SELECT p.norecopk FROM events p  WHERE MBRWITHIN(ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)'), p.spatialdata);  => 0m0.019s (ok)
- SELECT p.norecopk FROM events p  WHERE MBROVERLAPS(p.spatialdata, ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)')); => 0m14.961s (KO)
- SELECT p.norecopk FROM events p  WHERE MBRINTERSECTS(p.spatialdata, ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)')) => 0m14.738s (KO)
- SELECT p.norecopk FROM events p  WHERE MBRTOUCHES(p.spatialdata, ST_GeomFromText('LINESTRING(1552435200000 1, 1552435200000 -1)')) => 0m14.869s (KO)

The other functions I tested (MBRCovers, MBRDisjoint, ...) explicitly indicate in the EXPLAIN that the spatial index won't be used (and so it fallback to a full range scan that takes ~2-3s).

FYI I changed the title of the bug and the severity.

Thanks in advance for any help.

Best regards,
Cédric
[15 Mar 2019 10:57] Cedric Tabin
Optimizer trace for MBRIntersects (usage of spatial index = ok)

Attachment: sql-intersects.txt (text/plain), 6.77 KiB.

[15 Mar 2019 10:58] Cedric Tabin
Optimizer trace for MBRWithin (usage of spatial index = ok)

Attachment: sql-within.txt (text/plain), 6.75 KiB.

[15 Mar 2019 11:01] Cedric Tabin
Hello again,

I added the optimizer trace of both queries (MBRIntersects is KO actually, the label is wrong).
A diff shows very few differences, the costs are almost the same.
Hope it helps.

Best regards,
Cédric
[15 Mar 2019 13:40] MySQL Verification Team
Hi,

Thank you for your contribution.

Changing title and severity in this bug will not change anything, since the bug report that will be worked upon is in the internal database, where changes in the public database are not reflected, unless I change them. I have decided to change the title, to reflect the new one, as it makes sense.

Your InnoDB buffer pool is too small for GIS queries on the tables of this size. If the index was some INT, it would be OK. I used 8 Gb for InnoDB buffer. I have also highly optimised all of my MySQL instances, just for this reason that poor or sub-optimal configurations do not influence bug verification.

How to optimise MySQL server, InnoDB storage engine and how to pre-load indices is fully described in our Reference Manual.
[15 Mar 2019 14:25] Cedric Tabin
Hi Sinisa,

Thanks for your response.

Following your indications about the innodb_buffer_pool_size, I ran the MBRIntersects query with the following results:
- with 1G (1073741824) => 5.16s
- with 8G (8589934592) => 4.72s
- with 32G (34359738368) => 4.69s
- with 64G (68719476736) => 4.82s

This is still slower than a full range scan (less than 2s with a innodb buffer of 8G).
By increasing the buffer size, the query with a MBRContains is even faster (0.01s).

According to the documentation, I tried to change a bunch of different values (such as the key_buffer_size) with no significant improvement in the performances.  Since I use fresh installations (5.7 and 8.0), I wasn't expecting to experience a such significant performance drop with the default parameters.
Moreover, by reading https://dev.mysql.com/doc/refman/8.0/en/load-index.html, the LOAD INDEX command is only for MyISAM tables. Since I forced MySQL to read the records, the timings drops from 17s to 14s.

Since the MBRContains and MBRWithin functions are working fine on the same dataset, it looks really like a bug to me, especially as the MBRIntersects is slower with the index than a full range scan (which is a nonsense IMHO, the optimizer should avoid using it).

It seems that this performance drop has already been seen by other users:
- https://stackoverflow.com/questions/53819582/mysql-8-0-11-spatial-queries-slow-by-a-factor...
- https://dba.stackexchange.com/questions/214268/mysql-geo-spatial-query-is-very-slow-althou...

Thanks for your time and help.

Best regards,
Cédric
[15 Mar 2019 14:28] MySQL Verification Team
Hi Cedric,

Thank you for your new input.

Those will be added to our internal database.

There are several issues that can be improved with it comes to optimising  GIS queries.

This is a reason why this bug is fully verified.
[15 Mar 2019 14:41] Cedric Tabin
Hi Sinisa,

Thanks for your feedback.
Let me know if you need any other input.

Best regards,
Cédric
[6 May 2019 8:33] Cedric Tabin
Hi,

For information, the same behavior is still present in the version 8.0.16 of MySQL.

Since the same queries were working fine on the same dataset in MySQL 5.7, shouldn't this bug be considered as a regression and have a higher priority than "Feature request" ? In our case, this simply prevents us to migrate to the latest version, even if we do not use heavily spatial indexes, because it causes huge performance problems.

Best regards,
Cédric
[6 May 2019 13:13] MySQL Verification Team
Hi,

This is a regression in performance only. Regressions bugs are high severity bugs that occur due to the fix of some third bug. 

I can change  this into "Performance" bug, but I have intentionally labeled it as a feature request, since it has a higher severity.
[23 Jul 2019 11:55] Hajime Miyauchi
the same behavior is still present in the version 8.0.17 of MySQL.

ST_WithIn is fast.but ST_Intersects is slow...

mysql> SELECT
    ->   ST_AsGeoJson((SHAPE))
    -> FROM
    ->   h27ka23_utf8
    -> WHERE
    ->   ST_WithIn(
    ->     SHAPE,
    ->     ST_GeomFromText(
    ->       'POLYGON((43.08256990265745 141.3162472988521,
    '>                 43.056831004971265 141.3162472988521,
    '>                 43.056831004971265 141.35259659542191,
    '>                 43.08256990265745 141.35259659542191,
    '>                 43.08256990265745 141.3162472988521))',
    ->       4612
    ->     )
    ->  );
Empty set (0.01 sec)

mysql> SELECT
    ->   ST_AsGeoJson((SHAPE))
    -> FROM
    ->   h27ka23_utf8
    -> WHERE
    ->   ST_Intersects(
    ->     SHAPE,
    ->     ST_GeomFromText(
    ->       'POLYGON((43.08256990265745 141.3162472988521,
    '>                 43.056831004971265 141.3162472988521,
    '>                 43.056831004971265 141.35259659542191,
    '>                 43.08256990265745 141.35259659542191,
    '>                 43.08256990265745 141.3162472988521))',
    ->       4612
    ->     )
    ->  );
Empty set (2.64 sec)
[23 Jul 2019 14:02] MySQL Verification Team
Hi,

The behavior will remain the same until this feature request is implemented.

When it is implemented, then this page will be updated.

Scheduling of the feature request is a complex process to which we do not have access.
[10 Sep 2019 16:40] Phil Jenson
Simply ST_Intersects queries on a large data set are 100,000 times slower in v8 compared to v5.7. I am puzzled as to why this has not received greater attention.

https://stackoverflow.com/questions/53819582/mysql-8-0-11-spatial-queries-slow-by-a-factor...

Rating this as a feature request because this is a higher severity seems strange. It is a performance bug and should be rated accordingly.
[10 Sep 2019 16:53] Phil Jenson
Looking at the severity levels S4 (Feature Request) is, as you would expect, lower than performance bug. However, surly this warrants an S2 (Serious) priority?
[24 Sep 2019 12:26] MySQL Verification Team
Actually, feature request has higher priority then performance. 

Still, I am increasing the priority further ......
[12 Feb 2020 14:10] Patrick van Dijk
How can this have severity S3 (non-critical).
Any app trying to render and use MBRIntersects is useless.
This should be treated S1
[12 Feb 2020 14:19] MySQL Verification Team
I do not think that we need to enter into discussions about the definition of severity levels. Anyway, bugs with severity S1 and S2 are those concerning security issues, crashes or impossibility of performing certain actions. This bug is about performance, which is below definitions of S1 and S2.

Anyway, Severity level is not the only criteria for bug scheduling. Internally, without our access, bugs are triaged by a separate team, which is one of the criteria in determining the scheduling.
[12 Feb 2020 14:46] Hajime Miyauchi
Even if "Between operator does not use index", is the priority S3?
For the geospatial type it is just as bad.
I don't think this is a performance issue.
[12 Feb 2020 14:46] Phil Jenson
This bug was first reported nearly a year ago. For us it is critical and preventing us from migrating to v8. 

With little sign that this is even accepted as a serious bug we are facing the horrible decision of changing to a different database or waiting an unknown time for it to be fixed. 

It feels like those who can fix this are unaware it exists and the only way to highlight this is through the severity level being increased to 'do it next'
[12 Feb 2020 15:09] MySQL Verification Team
Ok.

You convinced me. It is "S2" now.

Also, everybody affected should click on "Affects Me" button. It is in the right column of the header, third row from above.
[12 Feb 2020 15:37] Patrick van Dijk
Have been testing a bit more on v8.0.18.
Table has 369504 rows.

16:29:09	select naam, geom from citygis.gewoon  WHERE MBRContains(ST_SRID(ST_MakeEnvelope(Point(170000,428000), Point(180000,430000)), 28992), geom) LIMIT 0, 10000	71 row(s) returned	0.141 sec / 0.000 sec

16:29:12	select gid, naam, geom from citygis.gewoon WHERE MBRIntersects(ST_SRID(ST_MakeEnvelope(Point(170000,420000), Point(180000,430000)), 28992), geom) LIMIT 0, 10000	1461 row(s) returned	22.187 sec / 13.469 sec

MBRContains and MBRIntersects should consider the same amount of records from the spatial inde and I doubt there is any difference in calculating if MBR intersects or is fully inside.

We have queries running on PostGIS where query time is in seconds, 
which take hours on MySQL, this is really bad.
E.g. count all roads inside municipalities.
[12 Feb 2020 16:05] Patrick van Dijk
More tests on table with 4941 rows

Without index

16:55:31	select gid, naam, geom from citygis.steden
 WHERE MBRIntersects(ST_SRID(ST_MakeEnvelope(Point(170000,428000), Point(180000,430000)), 28992), geom)
 LIMIT 0, 1000	2 row(s) returned	0.406 sec / 0.000 sec

16:55:33	select gid, naam, geom from citygis.steden
 WHERE MBRIntersects(geom, ST_SRID(ST_MakeEnvelope(Point(170000,428000), Point(180000,430000)), 28992))
 LIMIT 0, 1000	2 row(s) returned	0.407 sec / 0.000 sec

16:55:40	create spatial index sdx_steden_geom on citygis.steden(geom)	0 row(s) affected
 Records: 0  Duplicates: 0  Warnings: 0	0.844 sec

With index 
16:55:43	select gid, naam, geom from citygis.steden
 WHERE MBRContains(ST_SRID(ST_MakeEnvelope(Point(170000,428000), Point(180000,430000)), 28992), geom)
 LIMIT 0, 1000	2 row(s) returned	0.000 sec / 0.000 sec

16:55:46	select gid, naam, geom from citygis.steden
 WHERE MBRIntersects(ST_SRID(ST_MakeEnvelope(Point(170000,428000), Point(180000,430000)), 28992), geom)
 LIMIT 0, 1000	2 row(s) returned	0.515 sec / 0.000 sec

16:55:51	select gid, naam, geom from citygis.steden
 WHERE MBRIntersects(geom, ST_SRID(ST_MakeEnvelope(Point(170000,428000), Point(180000,430000)), 28992))
 LIMIT 0, 1000	2 row(s) returned	0.562 sec / 0.016 sec

More curious...making a larger MBR, returning 1563 rows is faster than returning just 2 rows.

17:02:33	select gid, naam, geom from citygis.steden
 WHERE MBRIntersects(ST_SRID(ST_MakeEnvelope(Point(0,0), Point(180000,430000)), 28992), geom)
 LIMIT 0, 5000	1563 row(s) returned	0.313 sec / 0.109 sec
[12 Feb 2020 16:21] Patrick van Dijk
Ok, last bit of info from me 

Using Explain Analyze

MBRContains													  
-> Filter: mbrcontains(<cache>(st_srid(st_makeenvelope(point(170000,428000),point(180000,430000)),28992)),steden.geom)  (cost=0.71 rows=1) (actual time=1.296..2.180 rows=2 loops=1)
    -> Index range scan on steden using sdx_steden_geom  (cost=0.71 rows=1) (actual time=1.050..1.844 rows=2 loops=1)

MBRIntersects

-> Filter: mbrintersects(<cache>(st_srid(st_makeenvelope(point(170000,428000),point(180000,430000)),28992)),steden.geom)  (cost=0.71 rows=1) (actual time=237.442..454.287 rows=2 loops=1)
    -> Index range scan on steden using sdx_steden_geom  (cost=0.71 rows=1) (actual time=0.207..37.832 rows=4941 loops=1)

Why does MBRIntersects see 4941 rows on the index range scan? 
while MBRContains sees only 2 rows
[7 Mar 2020 14:35] Jon Stephens
Documented fix as follows in the MySQL 8.0.20 changelog:

    Some queries against tables with spatial indexes were not performed
    as efficiently following an upgrade from MySQL 5.7 to MySQL 8.0.
      
Closed.
[9 Mar 2020 13:05] MySQL Verification Team
Thank you, Jon.
[25 Sep 2020 9:57] Luuk van den Berg
I'm still running into a related problem on version 8.0.21. When i run the st_within on a large data-set 15 million+ spatial indexes. 

On MySQL8 it takes 0.8s
On MySQL5.7 it takes less then 0.1s

Both these results use the same query and the same database, the only difference is the MySQL version. This problem only seems to occur on big datasets, I can supply such a data-set on request for testing purposes.
[17 May 2021 8:27] Wil Williams
I have just upgraded to MySQL 8 (Ubuntu upgrade using 20.04) and experienced this issue. In short queries go from a fraction of a second to 5-10 seconds or more. I still have an old MacBook Air (10 years old) with he same data on it. The machine upgraded to MySQL 8 is a 2 year old Metabox.

In short, old machine takes "0.00" seconds, newer machine with upgrade 6-10 seconds. Table has 1100323 rows containing contour features (1.5G contours.MYD).

Server version: 5.7.24-0ubuntu0.16.04.1 (Ubuntu)
mysql> SELECT id FROM data.contours WHERE ( MBRIntersects(shape,ST_GeomFromText('POLYGON((115.7614517211914 -32.09939956665039,115.77835845947266 -32.09939956665039,115.77835845947266 -32.08985137939453,115.7614517211914 -32.08985137939453,115.7614517211914 -32.09939956665039))', 0 )) );
34 rows in set (0.00 sec)

Server version: 8.0.25-0ubuntu0.20.04.1 (Ubuntu)
mysql> SELECT id FROM contours WHERE ( MBRIntersects(shape,ST_GeomFromText('POLYGON((115.7614517211914 -32.09939956665039,115.77835845947266 -32.09939956665039,115.77835845947266 -32.08985137939453,115.7614517211914 -32.08985137939453,115.7614517211914 -32.09939956665039))', 0 )) );
34 rows in set (5.88 sec)

mysql> SELECT id FROM contours WHERE ( ST_Within(shape,ST_GeomFromText('POLYGON((115.7614517211914 -32.09939956665039,115.77835845947266 -32.09939956665039,115.77835845947266 -32.08985137939453,115.7614517211914 -32.08985137939453,115.7614517211914 -32.09939956665039))', 0 )) );
10 rows in set (9.72 sec)

mysql> SELECT id FROM contours WHERE ( ST_Intersects(shape,ST_GeomFromText('POLYGON((115.7614517211914 -32.09939956665039,115.77835845947266 -32.09939956665039,115.77835845947266 -32.08985137939453,115.7614517211914 -32.08985137939453,115.7614517211914 -32.09939956665039))', 0 )) );
27 rows in set (9.82 sec)
[17 May 2021 14:24] MySQL Verification Team
The performance was improved over 8.0.15.