Bug #67511 Cost-based choice Materialization vs IN->EXISTS makes apparently wrong choice
Submitted: 7 Nov 2012 21:07 Modified: 22 Jan 2013 13:26
Reporter: Sergey Petrunya Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6.7 OS:Any
Assigned to: CPU Architecture:Any

[7 Nov 2012 21:07] Sergey Petrunya
Description:
Cost-based choice Materialization vs IN->EXISTS in MySQL 5.6 makes apparently wrong choice:

MySQL [test]> explain select col1, col1 in (select col2 from one_k_rows_tbl) from one_row_tbl;
+----+-------------+----------------+------+---------------+------+---------+------+------+-------+
| id | select_type | table          | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+----------------+------+---------------+------+---------+------+------+-------+
|  1 | PRIMARY     | one_row_tbl    | ALL  | NULL          | NULL | NULL    | NULL |    1 | NULL  |
|  2 | SUBQUERY    | one_k_rows_tbl | ALL  | NULL          | NULL | NULL    | NULL | 1000 | NULL  |
+----+-------------+----------------+------+---------------+------+---------+------+------+-------+

The upper select expects to produce one row.  That is, subquery will be evaluated one time.  It is apparent that executing subquery once with IN->EXISTS strategy is cheaper than  running the same subquery, storing its output in a temporary table, and then making one lookup on the temporary table.

How to repeat:
I'll upload the dataset. Run the above query on it.
[7 Nov 2012 21:08] Sergey Petrunya
Test dataset

Attachment: bug67511.sql (text/x-sql), 26.18 KiB.

[7 Nov 2012 21:11] Sergey Petrunya
Looking at the optimizer trace, I see:

                      "cost_to_create_and_fill_materialized_table": 383.6,
                      "cost_of_one_EXISTS": 192.8,
                      "number_of_subquery_evaluations": 2,
                      "cost_of_materialization": 384,
                      "cost_of_EXISTS": 385.6,

That is, cost_of_materialization < cost_of_EXISTS (predictably).  However, number_of_subquery_evaluations==2.  This is odd (EXPLAIN shows 1).
[7 Nov 2012 21:31] Sergey Petrunya
The number 2 comes from here:

void JOIN::refine_best_rowcount()
{
...
    /*
      Since it's only an estimate it's inaccurate. Setting estimate to 1
      row in some cases will make derived table a constant one.
      Currently it's impossible to revert it to non-const. Thus we
      adjust estimated # of rows to make derived table not a const one.
    */
    if (best_rowcount <= 1)
      best_rowcount= 2;
[8 Nov 2012 19:14] Sveta Smirnova
Thank you for the report.

Verified as described.
[22 Jan 2013 13:26] Paul DuBois
Noted in 5.6.10, 5.7.1 changelogs.

The optimizer could choose an IN-to-EXISTS transformation for
subquery execution in some cases when subquery materialization would
be cheaper.