Bug #39704 EXPLAIN inconsistence in SELECT .. WHERE = | IN (SELECT ...)
Submitted: 27 Sep 2008 23:13 Modified: 14 Aug 2009 4:12
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.0.68, 5.1.28, 6.0.6 OS:Any
Assigned to: Paul DuBois CPU Architecture:Any
Tags: qc

[27 Sep 2008 23:13] Peter Laursen
Description:
CREATE TABLE `t` (                      
          `id` int(11) DEFAULT NULL,            
          KEY `id` (`id`)                       
        ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into t(id) values (1),(2);

explain select * from t where id IN (select * from t where id =1);
-- inner query is identified as select_type "DEPENDENT SUBQUERY"

-- however 
explain select * from t where id = (select * from t where id =1);
-- now inner query is identified as select_type "SUBQUERY" only.

docs http://dev.mysql.com/doc/refman/5.1/en/using-explain.html say
"DEPENDENT typically signifies the use of a correlated subquery. See Section 12.2.9.7, “Correlated Subqueries”.

but http://dev.mysql.com/doc/refman/5.1/en/correlated-subqueries.html says 
"A correlated subquery is a subquery that contains a reference to a table that also appears in the outer query"

Why are they not both “Correlated Subqueries”?

How to repeat:
See above for the most simple example one can imagine.  

Of course this originated from a real-life example so it is not only reproducable with such silly example ...

Suggested fix:
I am not sure if there is some reason for this. But then I think it is not properly documented.

Also I am not sure if this is a cosmetical issue with the output of EXPLAIN only or if the server would typically use different execution plans for those two type of statements.
[28 Sep 2008 5:22] Valeriy Kravchuk
Thank you for a problem report. I've got the same results on 5.0.68 also. 

In any case, as id is NOT primary or unique key, subquery can return more than one row in both cases, and it is NOT correlated subquery (one can easily execute subquery first and then get a list of values to compare).

On 6.0.6 I've got other (but still different results):

C:\Program Files\MySQL\MySQL Server 5.0\bin>mysql -uroot -proot -P3311 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 6.0.6-alpha-community MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> CREATE TABLE `t` (
    ->           `id` int(11) DEFAULT NULL,
    ->           KEY `id` (`id`)
    ->         ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.13 sec)

mysql> insert into t(id) values (1),(2);
Query OK, 2 rows affected (0.06 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> explain select * from t where id IN (select * from t where id =1);
+----+-------------+-------+------+---------------+------+---------+-------+----
--+----------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | row
s | Extra                      |
+----+-------------+-------+------+---------------+------+---------+-------+----
--+----------------------------+
|  1 | PRIMARY     | t     | ref  | id            | id   | 5       | const |
1 | Using index                |
|  1 | PRIMARY     | t     | ref  | id            | id   | 5       | const |
1 | Using index; FirstMatch(t) |
+----+-------------+-------+------+---------------+------+---------+-------+----
--+----------------------------+
2 rows in set (0.05 sec)

mysql> explain select * from t where id = (select * from t where id =1);
+----+-------------+-------+------+---------------+------+---------+-------+----
--+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref   | row
s | Extra                    |
+----+-------------+-------+------+---------------+------+---------+-------+----
--+--------------------------+
|  1 | PRIMARY     | t     | ref  | id            | id   | 5       | const |
1 | Using where; Using index |
|  2 | SUBQUERY    | t     | ref  | id            | id   | 5       |       |
1 | Using index              |
+----+-------------+-------+------+---------------+------+---------+-------+----
--+--------------------------+
2 rows in set (0.00 sec)

FirstMatch(t) access path is NOT documented. Also I'd expect execution plans for these queries to be the same.
[28 Sep 2008 8:29] Peter Laursen
Also I think then that documentation here

"A correlated subquery is a subquery that contains a reference to a
table that also appears in the outer query"

.. is not correct or accurate? " What does "also appears" mean exactly?  Table `t` *appears* in both inner and outer query, as far as I can understand! 

I understand this writing so that I must consider the criteria ("also appears") true for both statements. But I understand Valeriy that for *none* of them subqueries are/should be considered correlated.
[28 Sep 2008 9:22] Peter Laursen
I also tried

explain select * from t where id = (select max(id) from t);

Now in 'extra' colum I get 'Select tables optimized away' (in both 5.0 and 5.1).  Is this output documented (nothing wrong about it, of course!)? 

Where do I find a *complete list* of all possible listings (and their meaning) in the 'extra' column?
[28 Sep 2008 16:00] Peter Laursen
""A correlated subquery is a subquery that contains a reference to a
table that also appears in the outer query""

what is it "that should also appear"? The table? The reference (and what is then exactly menat by 'reference' ?)? or the (more or less complete) query? 

Compare "the child having a puppy that lived in a house that is less than 3 years old". What is it that is less than 3 years old?  Child? Puppy? House?  Such ambigious language should not be found in technical docs!
[29 Sep 2008 3:54] Valeriy Kravchuk
For me "correlated subquery" is something defined like this (http://docs.hp.com/en/36216-90103/ch03s02.html):

"A correlated subquery is one in which the subquery makes reference to one or more columns of an outer query."

Essentially, subquery is correlated if you MUST execute it for each row found by outer query to get correct data.

Our optimizer, especially before 6.0, process many formally non-correlated subqueries as correlated. This is a known deficiency.

For detailed description of "Extra" column content, see http://dev.mysql.com/doc/refman/6.0/en/using-explain.html.
[14 Aug 2009 4:12] Paul DuBois
Nothing to do here beyond Valeriy's explanation, I think.