MySQL Bugs: #8749: EXPLAIN report different "type" w and w/o USE INDEX(...) for the same used key

Bug #8749	EXPLAIN report different "type" w and w/o USE INDEX(...) for the same used key
Submitted:	23 Feb 2005 21:08	Modified:	30 Sep 2009 14:11
Reporter:	jocelyn fournier (Silver Quality Contributor)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	4.1.10, 5.0.67	OS:	Any (*)
Assigned to:		CPU Architecture:	Any
Tags:	qc

Description:
Hi,

Given two queries where the only difference is the usage of USE INDEX(), EXPLAIN reports a "ref" optimisation and a "range" one for each respective query, whereas the choosen key is the same.
MySQL should report the same plan with or without USE INDEX() in this case ?

Regards,
  Jocelyn

How to repeat:
Download table in ftp.mysql.com/pub/mysql/secret/bug_optimizer.tar.gz

EXPLAIN SELECT numreponse FROM searchtest WHERE id='24399' AND numreponse>='2307728' AND topic='26369';

+----+-------------+------------+------+--------------------------+---------+---------+-------+------+--------------------------+
| id | select_type | table      | type | possible_keys            | key     | key_len | ref   | rows | Extra                    |
+----+-------------+------------+------+--------------------------+---------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | searchtest | ref  | PRIMARY,numreponse,topic | PRIMARY |       3 | const |  867 | Using where; Using index |
+----+-------------+------------+------+--------------------------+---------+---------+-------+------+--------------------------+

here a "ref" type is reported (should be "range" instead ?)

INSERT IGNORE INTO searchtest2 SELECT * FROM searchtest;

EXPLAIN SELECT numreponse FROM searchtest2 WHERE id='24399' AND numreponse>='2307728' AND topic='26369';

+----+-------------+-------------+-------+--------------------------+-------+---------+------+------+--------------------------+
| id | select_type | table       | type  | possible_keys            | key   | key_len | ref  | rows | Extra                    |
+----+-------------+-------------+-------+--------------------------+-------+---------+------+------+--------------------------+
|  1 | SIMPLE      | searchtest2 | range | PRIMARY,numreponse,topic | topic |      10 | NULL | 1071 | Using where; Using index |
+----+-------------+-------------+-------+--------------------------+-------+---------+------+------+--------------------------+
1 row in set (0.01 sec)

this time the "range" type is properly used.

EXPLAIN SELECT numreponse FROM searchtest2 USE INDEX(topic) WHERE id='24399' AND numreponse>='2307728' AND topic='26369';
+----+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table       | type  | possible_keys | key   | key_len | ref  | rows | Extra                    |
+----+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------+
|  1 | SIMPLE      | searchtest2 | range | topic         | topic |      10 | NULL | 1071 | Using where; Using index |
+----+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------+

ok, USE INDEX(topic) reports the same

TRUNCATE TABLE searchtest2;

INSERT IGNORE INTO searchtest2 SELECT * FROM searchtest WHERE id='24399' AND topic='26369';

INSERT IGNORE INTO searchtest2 SELECT * FROM searchtest LIMIT 1500;

EXPLAIN SELECT numreponse FROM searchtest2 WHERE id='24399' AND numreponse>='2307728' AND topic='26369';
+----+-------------+-------------+-------+--------------------------+-------+---------+------+------+--------------------------+
| id | select_type | table       | type  | possible_keys            | key   | key_len | ref  | rows | Extra                    |
+----+-------------+-------------+-------+--------------------------+-------+---------+------+------+--------------------------+
|  1 | SIMPLE      | searchtest2 | range | PRIMARY,numreponse,topic | topic |      10 | NULL |  615 | Using where; Using index |
+----+-------------+-------------+-------+--------------------------+-------+---------+------+------+--------------------------+

EXPLAIN SELECT numreponse FROM searchtest2 USE INDEX(topic) WHERE id='24399' AND numreponse>='2307728' AND topic='26369';
+----+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table       | type  | possible_keys | key   | key_len | ref  | rows | Extra                    |
+----+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------+
|  1 | SIMPLE      | searchtest2 | range | topic         | topic |      10 | NULL |  615 | Using where; Using index |
+----+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------+

Range once again, ok...

TRUNCATE TABLE searchtest2;

INSERT IGNORE INTO searchtest2 SELECT * FROM searchtest WHERE id='24399' AND topic='26369';

INSERT IGNORE INTO searchtest2 SELECT * FROM searchtest LIMIT 1000;

INSERT IGNORE INTO searchtest2 SELECT * FROM searchtest LIMIT 1500;

EXPLAIN SELECT numreponse FROM searchtest2 WHERE id='24399' AND numreponse>='2307728' AND topic='26369';
+----+-------------+-------------+------+--------------------------+---------+---------+-------+------+--------------------------+
| id | select_type | table       | type | possible_keys            | key     | key_len | ref   | rows | Extra                    |
+----+-------------+-------------+------+--------------------------+---------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | searchtest2 | ref  | PRIMARY,numreponse,topic | PRIMARY |       3 | const |  867 | Using where; Using index |
+----+-------------+-------------+------+--------------------------+---------+---------+-------+------+--------------------------+

"ref" type is back ?!

EXPLAIN SELECT numreponse FROM searchtest2 USE INDEX(topic) WHERE id='24399' AND numreponse>='2307728' AND topic='26369'; 
+----+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table       | type  | possible_keys | key   | key_len | ref  | rows | Extra                    |
+----+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------+
|  1 | SIMPLE      | searchtest2 | range | topic         | topic |      10 | NULL |  720 | Using where; Using index |
+----+-------------+-------------+-------+---------------+-------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

"range" with USE INDEX()

ANALYZE TABLE searchtest2;

EXPLAIN SELECT numreponse FROM searchtest2 WHERE id='24399' AND numreponse>='2307728' AND topic='26369';

+----+-------------+-------------+-------+--------------------------+-------+---------+------+------+--------------------------+
| id | select_type | table       | type  | possible_keys            | key   | key_len | ref  | rows | Extra                    |
+----+-------------+-------------+-------+--------------------------+-------+---------+------+------+--------------------------+
|  1 | SIMPLE      | searchtest2 | range | PRIMARY,numreponse,topic | topic |      10 | NULL |  960 | Using where; Using index |
+----+-------------+-------------+-------+--------------------------+-------+---------+------+------+--------------------------+

After analyze the "range" type is properly choosen.

Verified with latest 4.1.11 tree

Joselin,
the file bug_optimizer.tar.gz is no longer there, could you please just
attach it to this bug report. Thank you.

Hi Timour,

The file is now attached.

Regards,
  Jocelyn

This is not a bug for the following resons:

1) Notice that it is never the case that for the same index the optimizer chooses
different access methods depending on whether there was USE INDEX clause or
not. The clause "use index(topic)" always results in the "range" access method.

2) The first table "searchtest" is probably popultaed by many inserts and deletes,
while the second one is created via one "insert ... select" statement, thus it is quite
probable that the indexes of both tables take different number of pages. As a result:
2.1) Even after calling "analyse table searchtest" we may still get a different estimate
for the number of pages.
2.2) The number of rows is also an estimate, so it may also differ.

3) The clause "use index(topic)" forces the use of this index, and in this case it happens
that for index "topic" the range access method is always better than the "ref" method
with the same index. Thus we always get "range" plans if we force this index.

Having said all this, everything seems consistent to me, except for one issue
explained below.

There is an already known problem, reported as BUG#8539, where the
optimizer chooses "ref" instead of "range" access method even if "range"
is better. This is a known problem that will be fixed in an upcoming release.
This problem manifests itself here in the sense that for the first query
where the "ref" method is chosen for index PRIMARY, it may be so that
range is better for the same index (though I haven't checked this).

I hope this explanation clarifies everything.

Hi Timour,

Sorry, I didn't realize when providing the testcase the result wasn't exactly the same than for the original table I have.

On the original table I can see :

mysql> EXPLAIN SELECT numreponse FROM searchjoinhardwarefr13 USE INDEX(topic) WHERE id='24399' AND numreponse>='2307728' AND topic='26369';
+----+-------------+------------------------+-------+---------------+-------+---------+------+-------+--------------------------+
| id | select_type | table                  | type  | possible_keys | key   | key_len | ref  | rows  | Extra                    |
+----+-------------+------------------------+-------+---------------+-------+---------+------+-------+--------------------------+
|  1 | SIMPLE      | searchjoinhardwarefr13 | range | topic         | topic |      10 | NULL | 17321 | Using where; Using index |
+----+-------------+------------------------+-------+---------------+-------+---------+------+-------+--------------------------+
1 row in set (0.28 sec)

mysql> EXPLAIN SELECT numreponse FROM searchjoinhardwarefr13 WHERE id='24399' AND numreponse>='2307728' AND topic='26369';
+----+-------------+------------------------+------+--------------------------+-------+---------+-------+------+--------------------------+
| id | select_type | table                  | type | possible_keys            | key   | key_len | ref   | rows | Extra                    |
+----+-------------+------------------------+------+--------------------------+-------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | searchjoinhardwarefr13 | ref  | PRIMARY,numreponse,topic | topic |       3 | const |  117 | Using where; Using index |
+----+-------------+------------------------+------+--------------------------+-------+---------+-------+------+--------------------------+
1 row in set (0.02 sec)

As you can see the same index "topic" is used, but the "ref" is choosen if USE INDEX is not used.

I can upload the whole table if you want to check this problem.
(I reopen the bug since the system doesn't want me to just "Add Comment")

Regards,
  Jocelyn

Hi,

I uploaded the original table in :

ftp.mysql.com/pub/mysql/secret/optimizer_bug_2.tar.gz

Regards,
  Jocelyn

Analysis/cause for the bug:
----------------------------------
The problem is due to the current heuristic (non-cost-based) procedure
(both 4.1 and 5.0) that decides whether to use a "ref" access method vs.
"range". More precisely, if for the same table:
1a) there is more than one index, and
2a) if "ref" is applicable to one index, and "range" to another index,
then the optimizer always selects the "ref" access method vs. "range",
  OR
1b) if there is only one index, and
2b) if both the "ref" and "range" access methods are applicable to
to that index, then the optimizer selects the "range" access method
only if it uses the longer key, otherwise it selects the "ref" method.

Thus the cause of this bug is the same as for BUG#8539.

For this particular bug when we use the clause "use index" we
effectively limit all usable indexes to only one, the optimizer
applies case (b), and that is why we get "range" access. For the
test query that does not inlcude the "use index" clause there is
more than one index and the optimizer falls into case (a), so the
optimizer chooses the "ref" access method.

Hi Timour,

Is there any bugs in the cost estimation, since range reports 17321 rows to scan whereas ref reports 177 rows.
I would have expect the range method to always have less estimate rows than the ref method ?

Thanks,
  Jocelyn

From my previous comment it follows that the cost (and thus the number of records) of
"ref" is never compared to the cost of "range", so no matter how different these
estimates are, the choice of "ref" vs. "range" will follow the steps I already outlined.

The number of rows retrieved by the "ref" method is estimated in a different way
compared to "range", and this method is much less precise. That is why the estimate
differ so much.

Both problems result from the current design, and are not bugs per se. Solving
both issues (ref vs. range, and better cost estimate for ref) requires a considerable
effort and will change the execution plans for large classes of queries, thus such a
change is considered as a "feature request" rather than a bug.

We take a special note of such optimizer-related issues, and will fix them in an upcoming
release of MySQL 5.0.

Timour,

You said:
The problem is due to the current heuristic (non-cost-based) procedure
(both 4.1 and 5.0) that decides whether to use a "ref" access method vs.
"range". More precisely, if for the same table:
1a) there is more than one index, and
2a) if "ref" is applicable to one index, and "range" to another index,
then the optimizer always selects the "ref" access method vs. "range",

but in the bug report 8539, that is marked as duplicate for this bug report, if you decrease range for time_start optimizer uses "range" method.

Here is EXPLAIN output:

mysql> desc select count(*) from inc_performance where time_start >= '2005-02-04 16:30:00' AND inc_performance.time_start < '2005-02-15 16:30:00' and intv_type IN (1);
+----+-------------+-----------------+------+----------------------+-----------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+----------------------+-----------+---------+-------+-------+-------------+
| 1 | SIMPLE | inc_performance | ref | time_start,intv_type | intv_type | 5 | const | 63286 | Using where |
+----+-------------+-----------------+------+----------------------+-----------+---------+-------+-------+-------------+
1 row in set (0.00 sec)

mysql> desc select count(*) from inc_performance where time_start >= '2005-02-06 16:30:00' AND inc_performance.time_start < '2005-02-15 16:30:00' and intv_type IN (1);
+----+-------------+-----------------+-------+----------------------+------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+----------------------+------------+---------+------+------+-------------+
| 1 | SIMPLE | inc_performance | range | time_start,intv_type | time_start | 9 | NULL | 6645 | Using where |
+----+-------------+-----------------+-------+----------------------+------------+---------+------+------+-------------+
1 row in set (0.00 sec)

I agree that my previous explanation was an oversimplification.
In general the optimizer is a combination of heuristic rules and
cost-based optimization. As far as some of the decisions taken
by the optimizer are based on some heuristic rules, there will
always be cases when these rules miss an optimal plan.

The actual set of rules that govern the choice of an index are
more complex than the rule I outlined above, and the only
place where all these rules are listed in full detail is the code
itself with code comments.

We are aware of the current deficiencies of these rules, but we
cannot address them in a simple fix neither in 4.1 nor 5.0.
We have general plans for optimizer improvements, that will solve
the problem above as a special case, but these plans
are not definite yet.

Has this bug been addressed in any recent revisions?

We're seeing this problem in 5.0.67.