Bug #79808 Manual does NOT explain when "Select tables optimized away" is possible (or not)
Submitted: 29 Dec 2015 17:50 Modified: 9 Mar 2016 15:20
Reporter: Valeriy Kravchuk Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:any OS:Any
Assigned to: Paul DuBois CPU Architecture:Any
Tags: compound, MAX, min, missing manual, Select tables optimized away

[29 Dec 2015 17:50] Valeriy Kravchuk
Description:
Manual, http://dev.mysql.com/doc/refman/5.7/en/explain-output.html, says:

"Select tables optimized away (JSON: message text)

The query contained only aggregate functions (MIN(), MAX()) that were all resolved using an index, or COUNT(*), and no GROUP BY clause. The optimizer determined that only one row should be returned."

without any further details. So, one may expect that if only MIN() and/or MAX() function is used in the query that is covered ("resolved") by index, we should always see "Select tables optimized away" and a very fast execution.

This is not the case for compound indexes, when one part of the index covers WHERE clause and other provides a column for MIN() or MAX(). See below.

How to repeat:
mysql> create table t(c1 int, c2 int, key(c1), key(c2));
Query OK, 0 rows affected (0.13 sec)

mysql> insert into t values (1,1), (2,1);
Query OK, 2 rows affected (0.09 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> explain select min(c1) from t where c1<10;
+----+-------------+-------+------+---------------+------+---------+------+-----
-+------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows
 | Extra                        |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL
 | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+------------------------------+
1 row in set (0.00 sec)

As expected, so far. We have index on c1 and we select only MIN(c1). Let's continue:

mysql> explain select min(c2) from t where c1<10;
+----+-------------+-------+-------+---------------+------+---------+------+----
--+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | row
s | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+-------------+
|  1 | SIMPLE      | t     | range | c1            | c1   | 5       | NULL |
1 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+-------------+
1 row in set (0.00 sec)

Here optimization does NOT work as column c2 we ask for MIN() value of is NOT covered by the index. Expected. But what if we add the proper index:

mysql> alter table t add key k1(c1,c2);
Query OK, 0 rows affected (0.21 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain select min(c2) from t where c1<10;
+----+-------------+-------+-------+---------------+------+---------+------+----
--+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | row
s | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+--------------------------+
|  1 | SIMPLE      | t     | index | c1,k1         | k1   | 10      | NULL |
2 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+--------------------------+
1 row in set (0.00 sec)

mysql> explain select min(c2) from t where c1<1;
+----+-------------+-------+-------+---------------+------+---------+------+----
--+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | row
s | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+--------------------------+
|  1 | SIMPLE      | t     | index | c1,k1         | k1   | 10      | NULL |
2 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+--------------------------+
1 row in set (0.00 sec)

mysql> explain select min(c2) from t where c1<=1;
+----+-------------+-------+-------+---------------+------+---------+------+----
--+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | row
s | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+--------------------------+
|  1 | SIMPLE      | t     | index | c1,k1         | k1   | 10      | NULL |
2 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+--------------------------+
1 row in set (0.00 sec)

No way, nothing useful happens. We have to do full (or range) index scan and imagine how long can it take on a large table.

But for equality condition it works:

mysql> explain select min(c2) from t where c1=1;
+----+-------------+-------+------+---------------+------+---------+------+-----
-+------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows
 | Extra                        |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL
 | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+------------------------------+
1 row in set (0.00 sec)

Suggested fix:
Please, explain in the manual when this optimization is expected to work for compound (multiple-column) covering indexes and why it can not be used in other cases. 

Alternatively, please, improve it to cover more cases like the one I've demonstrated.
[29 Dec 2015 17:57] Valeriy Kravchuk
Added "missing manual" tag...
[29 Dec 2015 20:57] MySQL Verification Team
Thank you for the bug report.
[9 Mar 2016 15:20] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly.

Revised text:

The optimizer determined 1) that at most one row should be returned,
and 2) that to produce this row, a deterministic set of rows must be
read. When the rows to be read can be read during the optimization
phase (for example, by reading index rows), there is no need to read
any tables during query execution.

The first condition is fulfilled when the query is implicitly grouped
(contains an aggregate function but no GROUP BY clause). The second
condition is fulfilled when one row lookup is performed per index
used. The number of indexes read determines the number of rows to
read.

Consider the following implicitly grouped query:

SELECT MIN(c1), MIN(c2) FROM t1;

Suppose that MIN(c1) can be retrieved by reading one index row and
MIN(c2) can be retrieved by reading one row from a different index.
That is, for each column c1 and c2, there exists an index where the
column is the first column of the index. In this case, one row is
returned, produced by reading two deterministic rows.

This Extra value does not occur if the rows to read are not
deterministic. Consider this query:

SELECT MIN(c2) FROM t1 WHERE c1 <= 10;

Suppose that (c1, c2) is a covering index. Using this index, all rows
with c1 <= 10 must be scanned to find the minimum c2 value. By
contrast, consider this query:

SELECT MIN(c2) FROM t1 WHERE c1 = 10;

In this case, the first index row with c1 = 10 contains the minimum
c2 value. Only one row must be read to produce the returned row.

For storage engines that maintain an exact row count per table (such
as MyISAM, but not InnoDB), this Extra value can occur for COUNT(*)
queries for which the WHERE clause is missing or always true and
there is no GROUP BY clause. (This is an instance of an implicitly
grouped query where the storage engine influences whether a
deterministic number of rows can be read.)