Bug #59278 "not exists" optimization seen in EXPLAIN but not used
Submitted: 4 Jan 2011 12:50 Modified: 17 Jan 2011 21:01
Reporter: Guilhem Bichot Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6, trunk OS:Any
Assigned to: CPU Architecture:Any

[4 Jan 2011 12:50] Guilhem Bichot
Description:
EXPLAIN may report that for a table we are making use of the "not exists" optimization and of join buffering, like this:

select_all.result:1	SIMPLE	t2	ALL	NULL	NULL	NULL	NULL	1200	Using where; Not exists; Using join buffer (BNL, incremental buffers)

But the join buffering functions (which are block-nested-loop and batched key access algorithms), responsible for finding data rows, do not do the "not exists" optimization (this optimization requires testing the "not_exists_optimize" variable and is done only in the traditional *non-buffered* nested-loop algorithm: evaluate_join_record()).
So EXPLAIN is misleading: it makes us believe that an optimization is used, though it is in fact ignored internally.

This bug doesn't exist in 5.5, because there, join buffering is not used for outer joins, and as "not exists" is specific of outer joins, it was impossible to find the two concepts together. In 5.6, join buffering is used for outer joins.

How to repeat:
cd mysql-test/r
grep "Not exist" *| grep buffer

shows lines like:

select_all.result:1	SIMPLE	t2	ALL	NULL	NULL	NULL	NULL	1200	Using where; Not exists; Using join buffer (BNL, incremental buffers)

Suggested fix:
Either turn the "not exists" optimization off explicitely if join buffering is used (so that EXPLAIN shows reality), or make join buffering functions actually use the "not exists" optimization.
[17 Jan 2011 21:01] Guilhem Bichot
I was probably grepping in the wrong directory:
not_exists_optimize is tested by the join buffering code, see
  bool calc_check_only_first_match(const JOIN_TAB *t) const
  {
    return (t->last_sj_inner_tab == t &&
            t->get_sj_strategy() == SJ_OPT_FIRST_MATCH) ||
      (t->first_inner && t->first_inner->last_inner == t &&
         t->table->reginfo.not_exists_optimize);
  }
in sql_select.h. If the part of the condition above which is about not_exists_optimizer is commented out, we observe that a query does more disk reads:
CURRENT_TEST: main.join_outer_jcl6
--- /home/mysql_src/bzrrepos_new/mysql-next-mr-opt-backporting-wl4800/mysql-test/r/join_outer_jcl6.result       2011-01-17 17:58:42.000000000 +0300
+++ /home/mysql_src/bzrrepos_new/mysql-next-mr-opt-backporting-wl4800/mysql-test/r/join_outer_jcl6.reject       2011-01-17 23:52:02.000000000 +0300
@@ -1245,7 +1245,7 @@
 Handler_read_last      0
 Handler_read_next      9
 Handler_read_prev      0
-Handler_read_rnd       3
+Handler_read_rnd       9
 Handler_read_rnd_next  6
 DROP TABLE t1,t2;
 CREATE TABLE t1 (c int  PRIMARY KEY, e int NOT NULL);
This observation proves that the optimization is actually working with join buffering.