Bug #8749 | EXPLAIN report different "type" w and w/o USE INDEX(...) for the same used key | ||
---|---|---|---|
Submitted: | 23 Feb 2005 21:08 | Modified: | 30 Sep 2009 14:11 |
Reporter: | jocelyn fournier (Silver Quality Contributor) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 4.1.10, 5.0.67 | OS: | Any (*) |
Assigned to: | CPU Architecture: | Any | |
Tags: | qc |
[23 Feb 2005 21:08]
jocelyn fournier
[23 Feb 2005 22:01]
Jorge del Conde
Verified with latest 4.1.11 tree
[2 Mar 2005 11:08]
Timour Katchaounov
Joselin, the file bug_optimizer.tar.gz is no longer there, could you please just attach it to this bug report. Thank you.
[2 Mar 2005 12:47]
jocelyn fournier
Hi Timour, The file is now attached. Regards, Jocelyn
[4 Mar 2005 12:11]
Timour Katchaounov
This is not a bug for the following resons: 1) Notice that it is never the case that for the same index the optimizer chooses different access methods depending on whether there was USE INDEX clause or not. The clause "use index(topic)" always results in the "range" access method. 2) The first table "searchtest" is probably popultaed by many inserts and deletes, while the second one is created via one "insert ... select" statement, thus it is quite probable that the indexes of both tables take different number of pages. As a result: 2.1) Even after calling "analyse table searchtest" we may still get a different estimate for the number of pages. 2.2) The number of rows is also an estimate, so it may also differ. 3) The clause "use index(topic)" forces the use of this index, and in this case it happens that for index "topic" the range access method is always better than the "ref" method with the same index. Thus we always get "range" plans if we force this index. Having said all this, everything seems consistent to me, except for one issue explained below. There is an already known problem, reported as BUG#8539, where the optimizer chooses "ref" instead of "range" access method even if "range" is better. This is a known problem that will be fixed in an upcoming release. This problem manifests itself here in the sense that for the first query where the "ref" method is chosen for index PRIMARY, it may be so that range is better for the same index (though I haven't checked this). I hope this explanation clarifies everything.
[4 Mar 2005 13:18]
jocelyn fournier
Hi Timour, Sorry, I didn't realize when providing the testcase the result wasn't exactly the same than for the original table I have. On the original table I can see : mysql> EXPLAIN SELECT numreponse FROM searchjoinhardwarefr13 USE INDEX(topic) WHERE id='24399' AND numreponse>='2307728' AND topic='26369'; +----+-------------+------------------------+-------+---------------+-------+---------+------+-------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------------------+-------+---------------+-------+---------+------+-------+--------------------------+ | 1 | SIMPLE | searchjoinhardwarefr13 | range | topic | topic | 10 | NULL | 17321 | Using where; Using index | +----+-------------+------------------------+-------+---------------+-------+---------+------+-------+--------------------------+ 1 row in set (0.28 sec) mysql> EXPLAIN SELECT numreponse FROM searchjoinhardwarefr13 WHERE id='24399' AND numreponse>='2307728' AND topic='26369'; +----+-------------+------------------------+------+--------------------------+-------+---------+-------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------------------+------+--------------------------+-------+---------+-------+------+--------------------------+ | 1 | SIMPLE | searchjoinhardwarefr13 | ref | PRIMARY,numreponse,topic | topic | 3 | const | 117 | Using where; Using index | +----+-------------+------------------------+------+--------------------------+-------+---------+-------+------+--------------------------+ 1 row in set (0.02 sec) As you can see the same index "topic" is used, but the "ref" is choosen if USE INDEX is not used. I can upload the whole table if you want to check this problem. (I reopen the bug since the system doesn't want me to just "Add Comment") Regards, Jocelyn
[4 Mar 2005 13:30]
jocelyn fournier
Hi, I uploaded the original table in : ftp.mysql.com/pub/mysql/secret/optimizer_bug_2.tar.gz Regards, Jocelyn
[7 Mar 2005 17:25]
Timour Katchaounov
Analysis/cause for the bug: ---------------------------------- The problem is due to the current heuristic (non-cost-based) procedure (both 4.1 and 5.0) that decides whether to use a "ref" access method vs. "range". More precisely, if for the same table: 1a) there is more than one index, and 2a) if "ref" is applicable to one index, and "range" to another index, then the optimizer always selects the "ref" access method vs. "range", OR 1b) if there is only one index, and 2b) if both the "ref" and "range" access methods are applicable to to that index, then the optimizer selects the "range" access method only if it uses the longer key, otherwise it selects the "ref" method. Thus the cause of this bug is the same as for BUG#8539. For this particular bug when we use the clause "use index" we effectively limit all usable indexes to only one, the optimizer applies case (b), and that is why we get "range" access. For the test query that does not inlcude the "use index" clause there is more than one index and the optimizer falls into case (a), so the optimizer chooses the "ref" access method.
[7 Mar 2005 17:45]
jocelyn fournier
Hi Timour, Is there any bugs in the cost estimation, since range reports 17321 rows to scan whereas ref reports 177 rows. I would have expect the range method to always have less estimate rows than the ref method ? Thanks, Jocelyn
[8 Mar 2005 18:07]
Timour Katchaounov
From my previous comment it follows that the cost (and thus the number of records) of "ref" is never compared to the cost of "range", so no matter how different these estimates are, the choice of "ref" vs. "range" will follow the steps I already outlined. The number of rows retrieved by the "ref" method is estimated in a different way compared to "range", and this method is much less precise. That is why the estimate differ so much. Both problems result from the current design, and are not bugs per se. Solving both issues (ref vs. range, and better cost estimate for ref) requires a considerable effort and will change the execution plans for large classes of queries, thus such a change is considered as a "feature request" rather than a bug. We take a special note of such optimizer-related issues, and will fix them in an upcoming release of MySQL 5.0.
[14 Sep 2005 8:08]
MySQL Verification Team
Timour, You said: The problem is due to the current heuristic (non-cost-based) procedure (both 4.1 and 5.0) that decides whether to use a "ref" access method vs. "range". More precisely, if for the same table: 1a) there is more than one index, and 2a) if "ref" is applicable to one index, and "range" to another index, then the optimizer always selects the "ref" access method vs. "range", but in the bug report 8539, that is marked as duplicate for this bug report, if you decrease range for time_start optimizer uses "range" method. Here is EXPLAIN output: mysql> desc select count(*) from inc_performance where time_start >= '2005-02-04 16:30:00' AND inc_performance.time_start < '2005-02-15 16:30:00' and intv_type IN (1); +----+-------------+-----------------+------+----------------------+-----------+---------+-------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+------+----------------------+-----------+---------+-------+-------+-------------+ | 1 | SIMPLE | inc_performance | ref | time_start,intv_type | intv_type | 5 | const | 63286 | Using where | +----+-------------+-----------------+------+----------------------+-----------+---------+-------+-------+-------------+ 1 row in set (0.00 sec) mysql> desc select count(*) from inc_performance where time_start >= '2005-02-06 16:30:00' AND inc_performance.time_start < '2005-02-15 16:30:00' and intv_type IN (1); +----+-------------+-----------------+-------+----------------------+------------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+-------+----------------------+------------+---------+------+------+-------------+ | 1 | SIMPLE | inc_performance | range | time_start,intv_type | time_start | 9 | NULL | 6645 | Using where | +----+-------------+-----------------+-------+----------------------+------------+---------+------+------+-------------+ 1 row in set (0.00 sec)
[30 Sep 2005 9:26]
Timour Katchaounov
I agree that my previous explanation was an oversimplification. In general the optimizer is a combination of heuristic rules and cost-based optimization. As far as some of the decisions taken by the optimizer are based on some heuristic rules, there will always be cases when these rules miss an optimal plan. The actual set of rules that govern the choice of an index are more complex than the rule I outlined above, and the only place where all these rules are listed in full detail is the code itself with code comments. We are aware of the current deficiencies of these rules, but we cannot address them in a simple fix neither in 4.1 nor 5.0. We have general plans for optimizer improvements, that will solve the problem above as a special case, but these plans are not definite yet.
[18 Aug 2009 23:21]
Adam Bradley
Has this bug been addressed in any recent revisions? We're seeing this problem in 5.0.67.