Bug #14669 Optimizer unneccesary re-evaluates static parts
Submitted: 5 Nov 2005 10:06 Modified: 17 Aug 2007 19:13
Reporter: Gunnar von Boehn Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:5.0 and before OS:Any (any)
Assigned to: Assigned Account CPU Architecture:Any
Tags: Q1

[5 Nov 2005 10:06] Gunnar von Boehn
Description:
The optimizer sometimes reprocesses static parts of the query for each row.
When working with a higher number of records this can be quite expensive.

Example:  WHERE soundex(column) = soundex('static value')

The optimizer calculates the soundex of both the column and the static value for each row
instead of only calculating the soundex of the static value once.

The soundex is only an example here,
the optimizer behaves like this for other kind of functions too.

How to repeat:
Here the meassured example of comparing the soundex of
1 million rows of a very simple table:

mysql> show create table data;
CREATE TABLE data(
   name varchar(32) NOT NULL default ''
);
mysql> SELECT count(*) FROM data;
+----------+
| count(*) |
+----------+
|  1104902 |
+----------+

SELECT name FROM data WHERE soundex(name)=soundex('unknown record name');
Empty set (4.20 sec)

now the same but removing the soundex around the static value:

mysql> SELECT soundex('unknown record name');
+--------------------------------+
| soundex('unknown record name') |
+--------------------------------+
| U52562635                      |
+--------------------------------+
mysql> SELECT name FROM data WHERE soundex(name)='U52562635';
Empty set (3.62 sec)

You can see that re-calculating the soundex of the static value is quite expensive.

I assume the issue is easy to understand.
If you need test data then please tell me and I'll upload something.

Suggested fix:
Evaluating the static part of the query only once would speed up these cases a lot.
Its a 20% performance difference in the above example.

Cheers
Gunnar
[5 Nov 2005 16:28] MySQL Verification Team
Yes please provide the data for the test. If you prefer you can upload
the zip file at:

ftp://ftp.mysql.com:/pub/mysql/upload

please use a file name like bug14669.zip

Thanks in advance.
[7 Nov 2005 9:06] Gunnar von Boehn
Miguel, I have uploaded some testdata. A readme is included.
Actually I think the problem is not related to the content of the included data.
I have tested this behavior on several different servers
using different databases and all had the same problem.

Please get back to me if you have any questions.

Cheers
Gunnar
[7 Nov 2005 9:08] Gunnar von Boehn
Miguel,
Sorry, I forgot to mention that the test file is called bug14669.tgz

Cheers
Gunnar
[7 Nov 2005 11:06] Sergei Golubchik
How did you know that soundex was calculated for every row ?
[7 Nov 2005 11:24] Gunnar von Boehn
Sergei,
you are right. Actually I don't "know" that mysql re-evaluates the soundex each row.I'm only assuming this so far.

All I know, is that 
WHERE soundex(column) = soundex('static value') 
takes much longer than
WHERE soundex(column) = 'soundex of static value'

This slow behavior is true for other functions too.

The simpliest explanation to me was that MySQL is re-evaluating the function for each row.

Cheers
Gunnar
[12 Nov 2005 11:15] MySQL Verification Team
Gunnar,

I've commented with Sergei about the difference of speed you found
and he believes what you got was cached queries. Can you verify that.

Thanks in advance.
[13 Nov 2005 8:27] Gunnar von Boehn
Could you not verify the issue with my testdata?

To answer your question:
Of course I've disabled the query cache and rerun each test a
couple of times to have all the record in diskcache.
And as I've mentioned I've verified this on different
MySQL version and on different architectures.

Gunnar
[13 Nov 2005 13:06] MySQL Verification Team
I verified on Suse 9.3, FreeBSD 5.4 and Windows XP.
[17 Oct 2006 22:26] Igor Babaev
See also #21314.
[10 Jul 2007 8:19] Gleb Shchepa
See also #29338 (duplicate).
[25 Jul 2007 8:34] Martin Hansson
This bug affects only the scenario when there is no key prefix for the column.
For the example above this means that if there's a key declared either as KEY(column) or KEY(column, othercolumn, ...) the static call to soundex will not be repeated for every row.
[13 Aug 2007 12:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/32442

ChangeSet@1.2502, 2007-08-13 14:20:26+02:00, mhansson@linux-st28.site +6 -0
  Bug#14669: Optimizer unneccesary re-evaluates static parts
  
  The class Arg_comparator is used to access records using a conditional 
  expression when there is no key on the field(s) in question. It has 
  caching funcionality but it was only used for DATETIME types. This caused
  deterministic functions to be re-evaluated for every row.
  Fixed by implementing caching for all DETERMINISTIC functions with constant
  arguments that return type INT, CHAR, REAL, BINARY, or DECIMAL.
[17 Aug 2007 19:13] Igor Babaev
No CSC is bound to this bug.
A pure feature request: a new optimization is requested.
It will take 2 man-months (at least) to implement the functionality that would allow to avoid unnecessary expression evaluations.
Excluding duplicate evaluations of deterministic SF/UDF will be provided separately.

By the above reasons I move the bug to 'To be fixed later' and mark it as a 'Feature request'.
Product management will decide in what version this optimization appears.