Bug #42808 Execution Plan Cache
Submitted: 12 Feb 23:17 Modified: 13 Feb 6:25
Reporter: Mikiya Okuno
Status: Verified
Category:Server: Optimizer Severity:S5 (Performance)
Version:4.1, 5.0, 5.1 OS:Any
Assigned to: Target Version:
Triage: Triaged: D5 (Feature request)

[12 Feb 23:17] Mikiya Okuno
Description:
Current version of MySQL optimizer doesn't cache execution plan. It is necessary for the
optimizer to examine relevant tables for each query execution. This could lead many
round-trips upon tables.

How to repeat:
n/a

Suggested fix:
Normalize the query like a prepared statement like below.
Original: SELECT col1 FROM tbl1 WHERE col2=123
Normalized: SELECT col1 FROM tbl1 WHERE col2=N

Remove session related objects like select_result from JOIN instance.

Store N/V pair to a cache using normalized query/clean JOIN instance.
[13 Feb 6:25] Valeriy Kravchuk
Thank you for the reasonable feature request. I hope it will be implemented some day, as
an option.

Note also that you may use prepared statements to get a "local" cache for the session.
[9 Mar 13:29] Sergey Petrunya
Request taken.

It's more complicated than it seems though as "normalization" destroys certain
opportunities for the optimizer. If we get

SELECT * FROM tbl WHERE key1=const1 AND key2=const2 

then the optimizer will need the values of const1 and const2 to make an informed decision
about whether it should pick  key1, key2, or neither of them.  Normalization will cause us
to blindly pick only one of the query plans.

One easy way to tell which constants need normalization and which don't is to assume that
prepared statement parameters are normalized and everything else isn't. This will also
allow not to re-run the optimizer for prepared statement re-executions (and that's what
other DBMS do and what a good number of people may expect).
[9 Mar 13:32] Sergey Petrunya
Mikiya, 

[Check if the problem can be solved by means other than QEP cache] 
Do you have a sample of queries which cause the optimizer to be slow? (If yes could you
please provide it)
Is it definitely known that the problem is in the optimizer being slow? What is the
storage engine?
[10 Mar 7:18] Mikiya Okuno
Hi Sergey,

Thank you for your comment.

I have an idea to handle such a situation which normalization will cause us
to blindly pick only one of the query plans.

You know, mysqldumpslow CLI normalizes queries like below:

SELECT * FROM tbl WHERE key1=N AND key2=S

N stands for number literals, and S stands for string literals. If these literals are
mapped to several categories, we can normalize queries without dropping category
information like below:

SELECT * FROM tbl WHERE key1=N1 AND key2=S1

If many categories exist, digit after N or S may vary widely. How many patterns does the
optimizer identify literals? Does it depend?

I agree with prepared statement scenario, as it is why I filed this feature request :)

I do not have any specific query which cause the optimizer be slow. In general, optimizer
may be slow when many tables are joined, complex subqueries are used and so forth. On the
other hand, normalization and cache mechanism may take certain time, so cache should be
done only if the optimizer takes long. While it is difficult to determine a definite
formula to determine if it is slow, but it is relatively easy to have a parameter to give
a hint to the optimizer, like optimizer_search_depth.

The new parameter name could be optimizer_cache_depth or similar. And it controls if the
optimizer makes use of QEP cache or not. There will be a relation like below:

0 < optimizer_cache_depth < optimizer_search_depth

If such that parameter, optimizer_cache_depth, exists, we can have larger
optimizer_search_depth because queries which has search depth larger than
optimizer_cache_depth will be cached.

Kind regards!
[10 Mar 16:11] Eric Bergen
This seems like a solution looking for a problem. It's probably not worth spending time on
until there is some empirical evidence of the optimizer being slow and showing that a
cache can actually speed it up. 

I'm a bit worried that having a query plan cache will introduce unnecessary locking which
may make single query executions faster but may also make mysql slower when trying to
scale on multiple cores because of lock contention.