Bug #83166 Query execution involving spatial and non-spatial data very slow
Submitted: 27 Sep 2016 10:09 Modified: 11 Nov 2016 16:43
Reporter: Ioannis Priggouris Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: GIS Severity:S5 (Performance)
Version:5.7.15 OS:Any
Assigned to: CPU Architecture:Any
Tags: join, slow performance, spatial

[27 Sep 2016 10:09] Ioannis Priggouris
I have created a database containing both spatial (2 tables) and not spatial data (another 2 tables). Records between spatial and non spatial tables are somehow related throw a certain column.
Now what I try to do is to perform a query in order to get the records on a non spatial table related to specific entries retrieved by a spatial operation (intersection) performed on the spatial tables.

I first tried to do this, using a single query but the execution time spent to get the results were dissapointing.
So I decided to break the initial query on two separate ones. The first involving the spatial JOIN and putting the results in a temporary table,  and the second retrieving the required data by joining with these temporary table.

To my astonishment the performance improvement of this second approach compared to the single query is always between 95 - 99% faster

How to repeat:
Please execute the queries given in the attached file queries.sql
Query after the 1st comment is the single query
Query after the 2nd comment is the two queries approach using a temporary table
In my system an Intel Core i5 3,1Ghz computer with 8 Gb of ram running Windows 7 for the specific input I (BEN_ID=26) the first query takes about 60 secs to produce results, while the second one using the temporary table produces the same result with an overall execution time less than 0,2 secs!!!!!!

You can use other values of BEN_ID (present on the right_data table) to test further but the performance gain of the two queries approach over the single one will be always at the same level (95-99%)
During your reproduction process I suggest you do not use some BEN_ID with too many records on the right_details table (i.e., over 100) because the execution of the single query may take hours!!!! On the other hand the two queries approach returns in a few secs even in these cases.
[24 Oct 2016 13:06] Miguel Solorzano
Thank you for the bug report. Duplicate/related to bug: https://bugs.mysql.com/bug.php?id=79406.
[29 Oct 2016 20:03] Ioannis Priggouris
I cannot understand how the specific issue I reported is related to the one you suggested it is duplicate to!!!!

It is evident that the bug I report has nothing to do with the performance of spatial queries or spatial functions. It is clearly a problem related to the way mysql's query analyzer performs!!! 

This is obvious since the same query when separated to two individual queries communicating through a temp table as I explained, performs very well.
[29 Oct 2016 20:05] Ioannis Priggouris
I re-opened the bug for all the above
[31 Oct 2016 14:06] Sinisa Milivojevic
First of all, can you explain on what do you mean by query analyzer ??? What product is it exactly ???

Second, can you demonstrate your problem without using any GIS data and function, but non-spatial data.

Third, when you upload your data, please do it in .tar.gz or .zip package only.
[4 Nov 2016 5:37] Ioannis Priggouris
Saying query analyzer, I suggest that there is a problem in the way the query is analyzed internally by mysql before being executed. Now I do know that every database has such an analyzer which analyzes complex queries in order to provide the most performance effective execution plan for it. 
It is clear that this is not happening here. I do not know if the problem has to do with mixing GIS data and standard sql data, but it is evident that mysql does not provide the best execution plan for the query, although I have written it in a way that dictates so (i.e., having joins providing small numbers of results first etc).
On the second approach where the query is separated in two discrete ones, communicating data through a temp table, what I do, in fact, is to force the execution order of joins the way it should be. However, this should have been handled efficiently my mysql itself, on the single approach also.
[4 Nov 2016 14:31] Sinisa Milivojevic

First of all, we do not have any query analyzer, but query optimizer.

Second, can you demonstrate your problem without using any GIS data and
function, but non-spatial data. If you can't demonstrate a problem without GIS data and functions, then this bug is a duplicate as it was already notified.

Third, when you upload your data, please do it in .tar.gz or .zip
package only.
[4 Nov 2016 20:54] Ioannis Priggouris
I suggest we do not stick to words and terminology. I am not familiar with the exact naming you use for mysql internal components but I think that we both understand what I am suggesting.

Now regarding your second statement, I cannot understand how you relate my bug with the suggested one! The one you are suggesting as duplicate to mine refers to different performances between mysql 5.6 and 5.7 regarding spatial queries. I do not reported anything like that. In fact my queries have exactly the same behavior and time responses in both mysql 5.6 (no spatial indices) and 5.7 (with spatial indices). I just downloaded the latest version of 5.6 and run it. So I do not accept your classification as a duplicate, unless you proove me that this is not the case.
For me, it is clearly a query optimizer problem, because the second approach with the two separate queries still uses spatial functions but returns in milliseconds. So I do not see any poor performance of the spatial functions, either in 5.6 or 5.7 versions.

Finally, since you insisted persistently, I uploaded a zip format of my test  data, publicly available this time, so that everybody could run it and see the issue.
[4 Nov 2016 21:05] Ioannis Priggouris
Note that I uploaded through the sftp site this time, since zipping the same data created a larger file above 3 Mb. The data inside the file, though, are exactly the same with my first upload (the rar file).
[8 Nov 2016 14:29] Sinisa Milivojevic

First and first of all, we do have to stick to words and terminology. Only by using terminology that is specially developed for relational database systems, can we communicate and express problems and situations. That is why we have put our manual on-line. Entire terminology and entire functioning of our server is explained therein. 

Second, I guess that you have uploaded your ZIP file in the "incoming" directory. What is the exact file's name.

Third, I do not see your current setup. Can you send us the exact configuration as defined in the MySQL configuration file. If you have not configured MySQL for your purposes, I would suggest that you do that first.
[8 Nov 2016 23:22] Ioannis Priggouris
mysql configuration file

Attachment: my_test.ini (application/octet-stream, text), 13.79 KiB.

[8 Nov 2016 23:35] Ioannis Priggouris
For the uploaded zip file I used the exact naming convention elaborated on the files tab. So the filename is mysql-bug-data-83166.zip

Now I have also uploaded the mysql configuration my_test.ini, that I used for my tests.
Response Time with this configuration is about 14 secs for the single query and instant response (0.000 secs) for the temp table (2 queries) approach
[9 Nov 2016 15:46] Sinisa Milivojevic
We shall try to replicate your issue in due time.
[10 Nov 2016 19:46] Sinisa Milivojevic

I have tested your entire test case and, indeed, the variant with temporary table executes thousands time faster then variant with dependent nested queries.

This is a very well known issue since MySQL version 5.0. Please, read a manual on the subject of the dependent nested queries (some call it sub-queries). The dependent query has to be re-executed for each row in which it participates, which in your case is an inner join.

Our optimizer team has a worklog entry for optimizing out these nested queries, but its schedule is yet unknown.

We advise our users to do exactly what you did. Use temporary tables.
[11 Nov 2016 16:43] Paul Dubois
GIven Sinisa's remark that this is expected behavior: Closed with no further action.