Description:
When using a user defined function (UDF) like this
SELECT UDF(x) AS udf_result
FROM table
WHERE y=...
GROUP BY udf_result
i would expect one UDF call per each row matching the
WHERE condition (unless the UDF result is needed in the
WHERE condition itself which isn't the case here)
When actually tracking the number of calls
the results are:
- with SQL_SMALL_RESULT: number_of_matching_rows + number_of_groups
- with SQL_BIG_RESULT: number_of_total_rows + number_of_matching_rows + number_of_groups
I can see two different problems with this:
- performance, especially in the SQL_BIG_RESULT case
- with nondeterministic UDFs the results will be rather 'funny'
How to repeat:
create a UDF that logs how often it was called for a query and try the sample query above with both SQL_SMALL_RESULT and SQL_BIG_RESULT
Suggested fix:
* do not recalculate the UDF result, it is the grouping value of the group currently processed in later steps so no need to reevaluate it
* in the SQL_BIG_RESULT case: when retrieving the rows to be filesorted evaluate the UDF only *after* checking the WHERE condition, not before