Bug #30183 Redundant UDF calls when grouping by UDF result
Submitted: 1 Aug 2007 17:20 Modified: 2 Aug 2007 6:22
Reporter: Hartmut Holzgraefe Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: User-defined functions ( UDF ) Severity:S4 (Feature request)
Version:5.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: bfsm_2007_08_16

[1 Aug 2007 17:20] Hartmut Holzgraefe
When using a user defined function (UDF) like this

  SELECT UDF(x) AS udf_result 
    FROM table
   WHERE y=...
   GROUP BY udf_result

i would expect one UDF call per each row matching the
WHERE condition (unless the UDF result is needed in the
WHERE condition itself which isn't the case here)

When actually tracking the number of calls 
the results are:

 - with SQL_SMALL_RESULT: number_of_matching_rows + number_of_groups

 - with SQL_BIG_RESULT: number_of_total_rows + number_of_matching_rows + number_of_groups

I can see two different problems with this:

- performance, especially in the SQL_BIG_RESULT case

- with nondeterministic UDFs the results will be rather 'funny'

How to repeat:
create a UDF that logs how often it was called for a query and try the sample query above with both SQL_SMALL_RESULT and SQL_BIG_RESULT

Suggested fix:
* do not recalculate the UDF result, it is the grouping value of the group currently processed in later steps so no need to reevaluate it 

* in the SQL_BIG_RESULT case: when retrieving the rows to be filesorted evaluate the UDF only *after* checking the WHERE condition, not before
[2 Aug 2007 6:22] Valeriy Kravchuk
Thank you for a reasonable feature request.