MySQL Bugs: #39756: EXPLAIN for uncorrelated NOT EXISTS query is slow

Bug #39756	EXPLAIN for uncorrelated NOT EXISTS query is slow
Submitted:	30 Sep 2008 12:59	Modified:	5 May 2014 5:54
Reporter:	Roy Lyseng	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	6.0,5.4, 5.1	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Anuncorrelated NOT EXISTS query is not marked as expensive by the optimizer, hence the optimizer will attempt to execute the subquery during optimization, as witnessed by the following stack trace:

#0  sub_select (join=0x9c0d670, join_tab=0x9c04a10, end_of_records=false) at sql_select.cc:13668
#1  0x083b12a4 in do_select (join=0x9c0d670, fields=0x9c0341c, table=0x0, procedure=0x0) at sql_select.cc:13456
#2  0x083cda2c in JOIN::exec (this=0x9c0d670) at sql_select.cc:2833
#3  0x082def35 in subselect_single_select_engine::exec (this=0x9c03d58) at item_subselect.cc:2305
#4  0x082e389e in Item_subselect::exec (this=0x9c03cc0) at item_subselect.cc:285
#5  0x082dc825 in Item_exists_subselect::val_bool (this=0x9c03cc0) at item_subselect.cc:845
#6  0x082a3415 in Item_func_not::val_int (this=0x9c03d80) at item_cmpfunc.cc:275
#7  0x08288612 in eval_const_cond (cond=0x9c03d80) at item_func.cc:63
#8  0x083b0543 in remove_eq_conds (thd=0x9babbb0, cond=0x9c03d80, cond_value=0x9c0cb80) at sql_select.cc:11115
#9  0x083ba92f in optimize_cond (join=0x9c0b688, conds=0x9c03d80, join_list=0x9bad19c, cond_value=0x9c0cb80) at sql_select.cc:10953
#10 0x083c4262 in JOIN::optimize (this=0x9c0b688) at sql_select.cc:1447
#11 0x083c8673 in mysql_select (thd=0x9babbb0, rref_pointer_array=0x9bad1dc, tables=0x9c02f48, wild_num=1, fields=@0x9bad16c, conds=0x9c03d80, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147764740, result=0x9c03e60, unit=0x9bace40, select_lex=0x9bad0d8) at sql_select.cc:3009
#12 0x083c8b8e in mysql_explain_union (thd=0x9babbb0, unit=0x9bace40, result=0x9c03e60) at sql_select.cc:19413

The subquery execution triggered by the EXPLAIN statement is also causing some rather confusing explain output (check the "Extra" information in the 1. row below):

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Impossible WHERE
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: t2
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 9
        Extra: Using index 

BUG#14295 is about a similar but different problem

Notice also that if t2 is empty the "Extra" text will be different.

How to repeat:
create table t1(
  id integer primary key,
  u  integer not null unique,
  v  integer not null,
  vi integer not null, index(vi),
  s  char(1));
create table t2(
  id integer primary key,
  u  integer not null unique,
  v  integer not null,
  vi integer not null, index(vi),
  s  char(1));
insert into t1 values(10, 10, 10, 10, 'l');
insert into t2 values(10, 10, 10, 10, 'r');
insert into t1 values(20, 20, 20, 20, 'l');
insert into t2 values(30, 30, 30, 30, 'r');
insert into t1 values(40, 40, 40, 40, 'l');
insert into t1 values(41, 41, 40, 40, 'l');
insert into t1 values(50, 50, 50, 50, 'l');
insert into t1 values(51, 51, 50, 50, 'l');
insert into t2 values(50, 50, 50, 50, 'r');
insert into t1 values(60, 60, 60, 60, 'l');
insert into t1 values(61, 61, 60, 60, 'l');
insert into t2 values(60, 60, 60, 60, 'r');
insert into t2 values(61, 61, 60, 60, 'r');
insert into t1 values(70, 70, 70, 70, 'l');
insert into t2 values(70, 70, 70, 70, 'r');
insert into t2 values(71, 71, 70, 70, 'r');
insert into t2 values(80, 80, 80, 80, 'r');
insert into t2 values(81, 81, 80, 80, 'r'); 
explain select * from t1 where not exists(select * from t2)

Suggested fix:
Mark all uncorrelated subqueries that cannot be resolved with maximum 1 row lookup per contained table as "expensive"

Verified just as described:

C:\Program Files\MySQL\MySQL Server 5.0\bin>mysql -uroot -proot -P3311 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 6.0.6-alpha-community MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table t1(
    ->   id integer primary key,
    ->   u  integer not null unique,
    ->   v  integer not null,
    ->   vi integer not null, index(vi),
    ->   s  char(1));
Query OK, 0 rows affected (0.08 sec)

mysql> create table t2(
    ->   id integer primary key,
    ->   u  integer not null unique,
    ->   v  integer not null,
    ->   vi integer not null, index(vi),
    ->   s  char(1));
Query OK, 0 rows affected (0.09 sec)

mysql> insert into t1 values(10, 10, 10, 10, 'l');
Query OK, 1 row affected (0.05 sec)

mysql> insert into t2 values(10, 10, 10, 10, 'r');
Query OK, 1 row affected (0.06 sec)

mysql> insert into t1 values(20, 20, 20, 20, 'l');
Query OK, 1 row affected (0.05 sec)

mysql> insert into t2 values(30, 30, 30, 30, 'r');
Query OK, 1 row affected (0.06 sec)

mysql> insert into t1 values(40, 40, 40, 40, 'l');
Query OK, 1 row affected (0.05 sec)

mysql> insert into t1 values(41, 41, 40, 40, 'l');
Query OK, 1 row affected (0.05 sec)

mysql> insert into t1 values(50, 50, 50, 50, 'l');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t1 values(51, 51, 50, 50, 'l');
Query OK, 1 row affected (0.05 sec)

mysql> insert into t2 values(50, 50, 50, 50, 'r');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t1 values(60, 60, 60, 60, 'l');
Query OK, 1 row affected (0.03 sec)

mysql> insert into t1 values(61, 61, 60, 60, 'l');
Query OK, 1 row affected (0.06 sec)

mysql> insert into t2 values(60, 60, 60, 60, 'r');
Query OK, 1 row affected (0.11 sec)

mysql> insert into t2 values(61, 61, 60, 60, 'r');
Query OK, 1 row affected (0.08 sec)

mysql> insert into t1 values(70, 70, 70, 70, 'l');
Query OK, 1 row affected (0.11 sec)

mysql> insert into t2 values(70, 70, 70, 70, 'r');
Query OK, 1 row affected (0.05 sec)

mysql> insert into t2 values(71, 71, 70, 70, 'r');
Query OK, 1 row affected (0.05 sec)

mysql> insert into t2 values(80, 80, 80, 80, 'r');
Query OK, 1 row affected (0.05 sec)

mysql> insert into t2 values(81, 81, 80, 80, 'r');
Query OK, 1 row affected (0.05 sec)

mysql> explain select * from t1 where not exists(select * from t2)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Impossible WHERE
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: t2
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 9
        Extra: Using index
2 rows in set (0.00 sec)

The explain query is the same in 5.1

Maybe the solution is as simple as this.
It breaks a handful of 'explain' results in the test suite,
nothing else that I can see.

// Item_exists_subselect
virtual bool is_expensive_processor(uchar *arg) 
{
  return !is_correlated;
}