Bug #95719 | Stored Functions executed for all rows if temp tables are involved | ||
---|---|---|---|
Submitted: | 10 Jun 2019 22:48 | Modified: | 24 Jun 2019 3:45 |
Reporter: | Tibor Korocz | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S4 (Feature request) |
Version: | 5.6,5.7, 8.0 | OS: | Linux |
Assigned to: | CPU Architecture: | Any |
[10 Jun 2019 22:48]
Tibor Korocz
[11 Jun 2019 13:00]
MySQL Verification Team
Hi Mr. Korocz, Thank you for your bug report. However, I do not think that this is a bug. Your query is scanning the entire table. Also, it is basic SQL principle that stored function has to be executed for each row that is retrieved. Hence, this is not a bug. Unless, I am missing something .........
[11 Jun 2019 13:04]
MySQL Verification Team
Actually, to make things more clear. Every function has to be executed for each field in the SELECT list in every row that is filtered out of the WHERE clause. That is truly basic SQL.
[11 Jun 2019 14:27]
Tibor Korocz
Hi Sinisa, What I am trying to say here this maybe could be optimised when temp tables are involved. If there is a query which does not use temp tables but the query without "limit 1" would return a million row MySQL only executes the functions for one row because of the "limit 1", which is great. If I am running a similar query but in the execution plan you can see it is going to create temp tables MySQL will execute the functions 1 million times even if the query returns 1 single row. That is a huge performance killer. I know in the internal temp tables MySQL write down the whole rows and this is why it has to execute the functions for every single row. BUT is it possible to change the logic here? MySQL won't execute functions when creating the temp table, only when it reads the data out and if there is a "limit 1" will execute the functions only 1 time. We can call it feature request as well if that is better.
[12 Jun 2019 12:18]
MySQL Verification Team
Hi, I do not see that query of yours that makes a very large temporary table and returns one row. However, having written that, do note that there are many cases where such a scenario must be undertaken. Simply, too many queries need temporary tables just in order to resolve the query. This is a case for distinct and aggregating queries, those that have nested queries, plus many more instances. Hence, just in order to resolve the query you have to have a temporary table. If you then have sorting with limit, then it is another operation. So, yes, you must first resolve the entire query and only then proceed further. But, that is a matter of normalising schema and optimising queries, which is not a bug.
[12 Jun 2019 16:38]
Tibor Korocz
Hi, "I do not see that query of yours that makes a very large temporary table and returns one row." It's here, the query gives back only one row but as you can see from the general log `test_function` was executed 3 times because that query creates a temp table and MySQL executes the function for all the rows. With Limit and temp tables: mysql [localhost] {msandbox} (test) > select id,test_function(id) as number from test.test_table where id in (1,2,3) group by string order by string limit 1 ; +----+--------+ | id | number | +----+--------+ | 1 | 11 | +----+--------+ 1 row in set (0.00 sec) General Log: 2019-06-10T22:34:43.599054Z 8 Query select id,test_function(id) as number from test.test_table where id in (1,2,3) group by string order by string limit 1 2019-06-10T22:34:43.599290Z 8 Query SELECT number FROM test.test_table WHERE id= NAME_CONST('oid',1) INTO ret_number 2019-06-10T22:34:43.599389Z 8 Query SELECT number FROM test.test_table WHERE id= NAME_CONST('oid',2) INTO ret_number 2019-06-10T22:34:43.599542Z 8 Query SELECT number FROM test.test_table WHERE id= NAME_CONST('oid',3) INTO ret_number I understand that many queries have to create temp tables, but what I am still trying to ask here is it not possible to change the logic and execute the functions only when MySQL reads the data out from the temp table? (of course only in that case if that function does not involved in the creation of the temp table, like in ordering etc.. ) You can not avoid all the temp tables, so a feature like this could speed up queries with stored function and when temp tables are involved. I also can accept that if you are saying this is not possible because of the code or the architecture of InnoDB etc... but you are just saying this is not a Bug, I have never said that it's a bug, we can call it a new feature, performance optimisation, code improvement etc... Basically the main question is that: "Can we change the logic and in cases where the results of the functions are not involved in the creation of the temp tables, execute the functions only when MySQL is reading out the data from the temp table?"
[12 Jun 2019 17:00]
Tibor Korocz
I have changed the ticket to S4(Feature Request).
[13 Jun 2019 2:51]
Tibor Korocz
Hi Sinisa, I just would like to send some updates because I had a chance to discuss this case with a few MySQL developers and Morgan Tocker sent me a great link which describes the logical order in SQL language: https://blog.jooq.org/2016/12/09/a-beginners-guide-to-the-true-order-of-sql-operations/ The order is the following: - From - Where - Group By - Aggregations - Having - Window - Select - Distinct - Uninon - Order by - Offset - Limit As you can see, aggregations and select list evaluation comes after `order by` and LIMIT is the last one. Here is also a simplified test case: DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( id INT NOT NULL PRIMARY KEY auto_increment, gcol INT NOT NULL, fcol INT NOT NULL, INDEX (gcol) ); INSERT INTO t1 VALUES (NULL,1,1),(NULL,2,1),(NULL,3,1),(NULL,4,2),(NULL,5,1); mysql [localhost] {msandbox} (test) > SELECT gcol, SLEEP(1) FROM t1 WHERE 1 GROUP BY gcol LIMIT 1; +------+----------+ | gcol | SLEEP(1) | +------+----------+ | 1 | 0 | +------+----------+ 1 row in set (1.00 sec) mysql [localhost] {msandbox} (test) > SELECT fcol, SLEEP(1) FROM t1 WHERE 1 GROUP BY fcol LIMIT 1; +------+----------+ | fcol | SLEEP(1) | +------+----------+ | 1 | 0 | +------+----------+ 1 row in set (5.02 sec) You said: " Every function has to be executed for each field in the SELECT list in every row that is filtered out of the WHERE clause." That means the first query should take 5s as well even if it returns only one row, because in the SQL language the LIMIT comes last so all the functions should have been already executed. But it looks like MySQL is "cheating" and does not respect this order. Probably this is an optimization. When it comes to `order by` and temp tables MySQL use a different logic it is executing the functions for each row, which is the SQL standard. So as you can see MySQL mixing the logic and does not respect the SQL language order. Question is, can we use the same "cheat" when temp tables are involved and execute the functions after the LIMIT? Could we use the same optimization?
[13 Jun 2019 12:53]
MySQL Verification Team
HI Mr. Korocz, First of all, send my personal regards to my friend and colleague Morgan Tocker. Second, this is truly not a bug. As you can see, you have the aggregating query that has to resolved first, before you arrive to sorting the data and limiting the result set. Aggregated queries can be resolved only by temporary tables, except in some cases where filesort or index scan can be used. Your query is not the one that can fall in the last two categories, because your schema, tables and queries are not optimised enough to be resolved without temporary tables. Even if you resolve the aggregated query with two other methods, those have to be finished prior to other steps. Hence, this is not a feasible feature request.
[24 Jun 2019 3:45]
Tibor Korocz
Sleep should be executed during the `select list` evaluation, am I correct? Let's say I am normal user who doe snot have deep knowledge of MySQL. I write a query which has a functions which also writes another table. The query gives back only one row as result. So I am expecting the function in the background which also does write will write only 1 row. But know it will write as many rows as many will be in that temp table. DELIMITER ;; CREATE DEFINER=`root`@`localhost` FUNCTION `insertfunction`(fid integer) RETURNS CHAR(50) READS SQL DATA BEGIN Insert into test.t2 (fcol,gcol) values (fid,fid); RETURN fid; END;; DELIMITER ; mysql [localhost] {msandbox} (test) > select * from t2; Empty set (0.00 sec) mysql [localhost] {msandbox} (test) > SELECT gcol, insertfunction(gcol) FROM t1 WHERE 1 GROUP BY gcol LIMIT 1; +------+----------------------+ | gcol | insertfunction(gcol) | +------+----------------------+ | 1 | 1 | +------+----------------------+ 1 row in set (0.01 sec) mysql [localhost] {msandbox} (test) > select * from t2; +----+------+------+ | id | gcol | fcol | +----+------+------+ | 1 | 1 | 1 | +----+------+------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > truncate t2; Query OK, 0 rows affected (0.08 sec) mysql [localhost] {msandbox} (test) > SELECT fcol, insertfunction(fcol) FROM t1 WHERE 1 GROUP BY fcol LIMIT 1; +------+----------------------+ | fcol | insertfunction(fcol) | +------+----------------------+ | 1 | 1 | +------+----------------------+ 1 row in set (0.01 sec) mysql [localhost] {msandbox} (test) > select * from t2; +----+------+------+ | id | gcol | fcol | +----+------+------+ | 1 | 1 | 1 | | 2 | 1 | 1 | | 3 | 1 | 1 | | 4 | 2 | 2 | | 5 | 1 | 1 | +----+------+------+ 5 rows in set (0.00 sec) So MySQL allows to do this and the number of the rows in `t2` will depend on that if MySQL creates a temp table or not... I don't think that's good and I don't think everyone who is writing MySQL queries should need a such a deep knowledge in MySQL why does this happening... I have tried to find anything in the manual which describes this behaviour but I could not, if you know where is this mentioned in the manual please send me a link. Also what do you think about generated columns in internal temp tables? So if there would be a situation like this, it could be just a generated column which is a pointer to the function which has to be executed when MySQL is actually reading out that row. Generated columns are already exist in MySQL, would be possible to add them to temp tables like this?
[24 Jun 2019 12:34]
MySQL Verification Team
Hi, You have not supplied any arguments that contradict the facts that I wrote in my previous comment. No, deep knowledge of MySQL should not be required, unless you want to know why a query needs to use a temporary table in order to resolve aggregating and distinct queries, prior to doing any sorting and / or limiting. Regarding your documentation question, you simply have to read our entire chapter on the Optimiser.