Bug #42808 Execution Plan Cache
Submitted: 12 Feb 2009 22:17 Modified: 13 Feb 2009 5:25
Reporter: Mikiya Okuno Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:4.1, 5.0, 5.1 OS:Any
Assigned to:
Triage: Triaged: D5 (Feature request)

[12 Feb 2009 22:17] Mikiya Okuno
Current version of MySQL optimizer doesn't cache execution plan. It is necessary for the optimizer to examine relevant tables for each query execution. This could lead many round-trips upon tables.

How to repeat:

Suggested fix:
Normalize the query like a prepared statement like below.
Original: SELECT col1 FROM tbl1 WHERE col2=123
Normalized: SELECT col1 FROM tbl1 WHERE col2=N

Remove session related objects like select_result from JOIN instance.

Store N/V pair to a cache using normalized query/clean JOIN instance.
[13 Feb 2009 5:25] Valerii Kravchuk
Thank you for the reasonable feature request. I hope it will be implemented some day, as an option.

Note also that you may use prepared statements to get a "local" cache for the session.
[9 Mar 2009 12:29] Sergey Petrunya
Request taken.

It's more complicated than it seems though as "normalization" destroys certain opportunities for the optimizer. If we get

SELECT * FROM tbl WHERE key1=const1 AND key2=const2 

then the optimizer will need the values of const1 and const2 to make an informed decision about whether it should pick  key1, key2, or neither of them.  Normalization will cause us to blindly pick only one of the query plans.

One easy way to tell which constants need normalization and which don't is to assume that prepared statement parameters are normalized and everything else isn't. This will also allow not to re-run the optimizer for prepared statement re-executions (and that's what other DBMS do and what a good number of people may expect).
[9 Mar 2009 12:32] Sergey Petrunya

[Check if the problem can be solved by means other than QEP cache] 
Do you have a sample of queries which cause the optimizer to be slow? (If yes could you please provide it)
Is it definitely known that the problem is in the optimizer being slow? What is the storage engine?
[10 Mar 2009 6:18] Mikiya Okuno
Hi Sergey,

Thank you for your comment.

I have an idea to handle such a situation which normalization will cause us
to blindly pick only one of the query plans.

You know, mysqldumpslow CLI normalizes queries like below:

SELECT * FROM tbl WHERE key1=N AND key2=S

N stands for number literals, and S stands for string literals. If these literals are mapped to several categories, we can normalize queries without dropping category information like below:

SELECT * FROM tbl WHERE key1=N1 AND key2=S1

If many categories exist, digit after N or S may vary widely. How many patterns does the optimizer identify literals? Does it depend?

I agree with prepared statement scenario, as it is why I filed this feature request :)

I do not have any specific query which cause the optimizer be slow. In general, optimizer may be slow when many tables are joined, complex subqueries are used and so forth. On the other hand, normalization and cache mechanism may take certain time, so cache should be done only if the optimizer takes long. While it is difficult to determine a definite formula to determine if it is slow, but it is relatively easy to have a parameter to give a hint to the optimizer, like optimizer_search_depth.

The new parameter name could be optimizer_cache_depth or similar. And it controls if the optimizer makes use of QEP cache or not. There will be a relation like below:

0 < optimizer_cache_depth < optimizer_search_depth

If such that parameter, optimizer_cache_depth, exists, we can have larger optimizer_search_depth because queries which has search depth larger than optimizer_cache_depth will be cached.

Kind regards!
[10 Mar 2009 15:11] Eric Bergen
This seems like a solution looking for a problem. It's probably not worth spending time on until there is some empirical evidence of the optimizer being slow and showing that a cache can actually speed it up. 

I'm a bit worried that having a query plan cache will introduce unnecessary locking which may make single query executions faster but may also make mysql slower when trying to scale on multiple cores because of lock contention.