Bug #90375 Significantly improve performance of simple functions
Submitted: 11 Apr 2018 1:24 Modified: 18 Apr 2018 14:27
Reporter: Yoseph Phillips Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.7.21 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[11 Apr 2018 1:24] Yoseph Phillips
Description:
Significantly improve performance of simple functions

We are using InnoBD on MySQL 5.7.21. 
We have also tested this on MySQL 5.7.18 EE and we get the same results.

When running the SELECT query with the simple function in-lined in the how to repeat section the query stats show that the execution time is 0.015 seconds.

When running the SELECT using the simple function directly the query stats show that the execution time is 1.125 seconds.

The results are identical and the visual execution plan is completely identical.

Creating the table with 100,000 rows can take some time, however it is easy to repeat this with any query that returns lots of results, so any existing table could be used.

We have seen other reports asking for performance enhancements for all procedures and functions. 
Here we are requesting something much simpler. 

Whenever a FUNCTION is used in a SELECT clause and the FUNCTION just contains one RETURN statement for MySQL to inline that return statement.

In our example the id parameter would be replaced with t.groupID and then query can be simply in-lined.
Some care would need to be taken with renaming table aliases so that there are no conflicts. Also in some cases the inline query might need to be converted to the return type of the FUNCTION (for functional equivalence).

This is similar to what MySQL already does with VIEWs when they are using the merge algorithm.

Obviously the queries in this example could be rewritten to avoid using functions and even subqueries, however we have many much more complicated functions that can be rewritten to just be a simple RETURN statement.

In many of our cases the performance differences are even more extreme than shown in this simple example. 

Our only workarounds currently are:
* manually in-lining all of the functions where we are noticing performance issues which is making many queries much harder to read, and if ever we do need to change the functionality of a function then we need to search through and change it all over the place
* changing the queries to use VIEWs instead, this is also proving problematic as if these views end up using the temporary table algorithm for whatever reason then performance can be extremely slow

How to repeat:
DROP TABLE IF EXISTS temperature;

CREATE TABLE temperature(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT, 
  groupID INT UNSIGNED NOT NULL,
  recordedTimestamp TIMESTAMP NOT NULL, 
  recordedValue INT NOT NULL,
  PRIMARY KEY (`id`)
);

DROP PROCEDURE IF EXISTS temporaryProcedurePopulateTable;

DELIMITER $$

CREATE PROCEDURE temporaryProcedurePopulateTable() 
BEGIN
  DECLARE counter INT DEFAULT 0;
  
  WHILE counter < 100000 DO
    INSERT INTO temperature(groupID, recordedTimestamp, recordedValue) VALUES(floor(rand() * 100), now(), floor(rand() * 40));
    SET counter = counter + 1;
  END WHILE;
END $$

DELIMITER ;

CALL temporaryProcedurePopulateTable();

DROP TABLE IF EXISTS temperature_group;

CREATE TABLE temperature_group (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT, 
  name varchar(10) NOT NULL,
  PRIMARY KEY (id)
)
SELECT DISTINCT t.groupID AS id, concat('Group ', t.groupID) AS name FROM temperature t;

DROP FUNCTION IF EXISTS  getGroupName;

DELIMITER $$

CREATE FUNCTION getGroupName(id INT UNSIGNED) RETURNS VARCHAR(10) DETERMINISTIC
BEGIN
  RETURN (SELECT g.name FROM temperature_group g WHERE g.id = id); 
END $$

DELIMITER ;

SELECT t.id, (SELECT g.name FROM temperature_group g WHERE g.id = t.groupID) AS groupName, t.recordedTimestamp, t.recordedValue FROM temperature t ORDER BY t.id;

SELECT t.id, getGroupName(t.groupID) AS groupName, t.recordedTimestamp, t.recordedValue FROM temperature t ORDER BY t.id;

Suggested fix:
Whenever a FUNCTION is used in a SELECT clause and the FUNCTION just contains one RETURN statement for MySQL to inline that return statement.
[16 Apr 2018 4:45] MySQL Verification Team
Hi,

This is not a bug. It is known limitation of the MySQL Server and one that we hope to change in near future. I suggest you contact MySQL support team as they can help you optimize your queries so that they will get max out of your MySQL Server.

best regards
Bogdan
[18 Apr 2018 8:22] Yoseph Phillips
Can we please keep this Open until it is addressed so that we will get notifications regarding this. We already know how to workaround the issue as mentioned in the description. The issue is that those workarounds make the code much harder to read and require changes all over the place instead of in just one function (they defeat the purpose of having functions).