Bug #59696 Optimizer fails to move WHERE condition on JOIN column when joining with a view
Submitted: 24 Jan 2011 14:13 Modified: 30 Sep 2011 18:22
Reporter: Olav Sandstå Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.1.56,5.5.10, 5.6.2 OS:Any
Assigned to: Olav Sandstå CPU Architecture:Any

[24 Jan 2011 14:13] Olav Sandstå
Description:
When optimizing JOINs the MySQL optimizer tries to move WHERE
conditions on the join column(s) to the first table in the JOIN in
order to reduce the number of records it has to read. If the JOIN
contains a view it sometimes files to move WHERE condition on the join
column to the first table. This can result in less optimal execution
of the join.

Example (see reproduction test case for table definitions etc):

The following query: 

  SELECT t2.pk, t2.col_int_key FROM t2, t1 where t2.pk = t1.col_int_key and t2.pk >= 2;

has the followin exlain/query plan when the two tables that are joined are normal tables:

id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       SIMPLE  t1      ALL     NULL    NULL    NULL    NULL    8       Using where
1       SIMPLE  t2      eq_ref  PRIMARY PRIMARY 4       test.t1.col_int_key     1    

Note that in the "Extra" column the "Using where" is listed for the
table "t1" and that there is no "Using where" for the table "t2". This
is due to the optimzer is "moving" the t2.pk >= 2 from t2 to t1 and
replacing it with

   `test`.`t1`.`col_int_key` >= 2

If I create a view on top of "t2" that should be identical to "t2":

  CREATE VIEW v_t2 AS SELECT * FROM t2;

and replaces t2 in the above query with this view:

  SELECT v_t2.pk, v_t2.col_int_key FROM v_t2, t1 where v_t2.pk = t1.col_int_key and v_t2.pk >= 2;

the explain/query plan is changed to:

id      select_type     table   type    possible_keys   key     key_len ref     rows    Extra
1       SIMPLE  t1      ALL     NULL    NULL    NULL    NULL    8       
1       SIMPLE  t2      eq_ref  PRIMARY PRIMARY 4       test.t1.col_int_key     1       Using index condition

In this case (with a view) the optimizer does not move the WHERE
condition from t2 to t1. I think it should also be able to do this in
this case. This leads to a more costly execution of the join.

How to repeat:
Test case:

CREATE TABLE t1 (
  col_int_key INTEGER NOT NULL
) ENGINE=InnoDB;

INSERT INTO t1 VALUES (1),(2),(3),(5),(6),(8),(9);

CREATE TABLE t2 (
  pk INTEGER NOT NULL,
  col_int_key INTEGER NOT NULL,
  PRIMARY KEY (pk)
) ENGINE=InnoDB;

INSERT INTO t2 VALUES (5,5),(8,8),(9,3),(1,4),(3,5),(2,6),(6,8);

CREATE VIEW v_t2 AS SELECT * FROM t2;

EXPLAIN
SELECT t2.pk, t2.col_int_key FROM t2, t1 where t2.pk = t1.col_int_key and t2.pk >= 2;

EXPLAIN
SELECT v_t2.pk, v_t2.col_int_key FROM v_t2, t1 where v_t2.pk = t1.col_int_key and v_t2.pk >= 2;

DROP VIEW v_t2;
DROP TABLE t1, t2;

Suggested fix:
The WHERE condition on the JOIN column should be moved to the first table of the JOIN.
[24 Jan 2011 14:16] Olav Sandstå
This problem was discovered when working on Bug#59347. Note that until that issues is fixed the two queries in the "how to reproduce" section will produce different results. To avoid different results just add:

set optimizer_switch='index_condition_pushdown=off';

to the test case.
[24 Jan 2011 14:19] Valeriy Kravchuk
Verified also with current mysql-5.5:

macbook-pro:5.5 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 66
Server version: 5.5.10-debug Source distribution

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE t1 (
    ->   col_int_key INTEGER NOT NULL
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO t1 VALUES (1),(2),(3),(5),(6),(8),(9);
Query OK, 7 rows affected (0.00 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> CREATE TABLE t2 (
    ->   pk INTEGER NOT NULL,
    ->   col_int_key INTEGER NOT NULL,
    ->   PRIMARY KEY (pk)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.10 sec)

mysql> INSERT INTO t2 VALUES (5,5),(8,8),(9,3),(1,4),(3,5),(2,6),(6,8);
Query OK, 7 rows affected (0.00 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> CREATE VIEW v_t2 AS SELECT * FROM t2;
Query OK, 0 rows affected (0.09 sec)

mysql> EXPLAIN
    -> SELECT t2.pk, t2.col_int_key FROM t2, t1 where t2.pk = t1.col_int_key and t2.pk >= 2;
+----+-------------+-------+--------+---------------+---------+---------+---------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                 | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+---------------------+------+-------------+
|  1 | SIMPLE      | t1    | ALL    | NULL          | NULL    | NULL    | NULL                |    7 | Using where |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY       | PRIMARY | 4       | test.t1.col_int_key |    1 |             |
+----+-------------+-------+--------+---------------+---------+---------+---------------------+------+-------------+
2 rows in set (0.06 sec)

mysql> EXPLAIN
    -> SELECT v_t2.pk, v_t2.col_int_key FROM v_t2, t1 where v_t2.pk = t1.col_int_key and v_t2.pk
    -> >= 2;
+----+-------------+-------+--------+---------------+---------+---------+---------------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                 | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+---------------------+------+-------------+
|  1 | SIMPLE      | t1    | ALL    | NULL          | NULL    | NULL    | NULL                |    7 |             |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY       | PRIMARY | 4       | test.t1.col_int_key |    1 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+---------------------+------+-------------+
2 rows in set (0.00 sec)
[24 Jan 2011 14:27] Valeriy Kravchuk
So, for now this is not only about performance, but also about wrong results...
[24 Jan 2011 14:38] Valeriy Kravchuk
On 5.1 plans look the same in both cases:

macbook-pro:5.1 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.56-debug Source distribution

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> drop table t1, t2;
ERROR 1051 (42S02): Unknown table 't2'
mysql> CREATE TABLE t1 (
    ->   col_int_key INTEGER NOT NULL
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.12 sec)

mysql> 
mysql> INSERT INTO t1 VALUES (1),(2),(3),(5),(6),(8),(9);
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> 
mysql> CREATE TABLE t2 (
    ->   pk INTEGER NOT NULL,
    ->   col_int_key INTEGER NOT NULL,
    ->   PRIMARY KEY (pk)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.14 sec)

mysql> 
mysql> INSERT INTO t2 VALUES (5,5),(8,8),(9,3),(1,4),(3,5),(2,6),(6,8);
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> 
mysql> CREATE VIEW v_t2 AS SELECT * FROM t2;
Query OK, 0 rows affected (0.20 sec)

mysql> 
mysql> EXPLAIN
    -> SELECT t2.pk, t2.col_int_key FROM t2, t1 where t2.pk = t1.col_int_key and t2.pk >= 2;
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                          |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------------+
|  1 | SIMPLE      | t2    | range | PRIMARY       | PRIMARY | 4       | NULL |    3 | Using where                    |
|  1 | SIMPLE      | t1    | ALL   | NULL          | NULL    | NULL    | NULL |    7 | Using where; Using join buffer |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------------+
2 rows in set (0.09 sec)

mysql> 
mysql> EXPLAIN
    -> SELECT v_t2.pk, v_t2.col_int_key FROM v_t2, t1 where v_t2.pk = t1.col_int_key and v_t2.pk
    -> >= 2;
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                          |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------------+
|  1 | SIMPLE      | t2    | range | PRIMARY       | PRIMARY | 4       | NULL |    3 | Using where                    |
|  1 | SIMPLE      | t1    | ALL   | NULL          | NULL    | NULL    | NULL |    7 | Using where; Using join buffer |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------------+
2 rows in set (0.00 sec)
[27 Jan 2011 15:51] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/129789

3559 Olav Sandstaa	2011-01-27
      Fix for Bug#59696 Optimizer fails to move WHERE condition on JOIN column 
                        when joining with a view
      
      When optimizing a JOIN of two tables the optimizer tries to move WHERE
      conditions on the join column(s) to the first table in the JOIN in
      order to reduce the number of records it has to read. If the second
      operand of the JOIN is a view that is identical to the underlying
      table the optimizer should be able to perform the same optimization.
      In this case it fails to move the WHERE condition on the join column
      to the first table.
      
      The cause for this is that when the second operand is a view then the
      WHERE clauses is stored using Item_ref object that references the real
      item objects. The existing code fails to evaluate these since it only
      evaluates the Item_ref object and fails to look at the real
      objects. As a result it fails to detect that the WHERE clause on the
      second join operand can be substituted by a corresponding WHERE clause
      on the first operand.
      
      There are two places where we fail to handle Item_ref correctly and that
      causes the optimizer to fail to do this optimization:
      
      1. In optimize_cond() we call build_equal_items() which again calls
      build_equal_items_for_cond() where we in Item_func::compile() fail to
      evaluate the Item::equal_fields_propagator() for the condition on the
      view operand. As a result the field used in the condition for the view
      is not considered as an equal field to the join column of the first
      table when this later is evaluated in make_join_statistics().
      
      2. Later in JOIN::optimize() we call substitute_for_best_equal_field()
      which uses Item_func::transform() to select the best condition of the
      tables. Item_func::transform() fails to evaluate the different
      conditions that are equal due to these being stored in a Item_ref
      object.
      
      The fix for these problems is to implement compile() and transform()
      methods for the Item_ref class that will do the proper evaluation of
      the compile() and transform on the objects the Item_ref object has the
      reference to. This will ensure that when Item_func::compile() and
      Item_func::transform() evaluates Item_ref objects both the Item_ref
      object and the object it references get evaluated.
     @ mysql-test/r/join.result
        Test case for Bug#59696 Optimizer fails to move WHERE condition
        on JOIN column when joining with a view.
     @ mysql-test/r/subquery_sj_none.result
        Change in explain/query plan due to the fix for Bug#59696 "Optimizer
        fails to move WHERE condition on JOIN column when joining with a view"
        
        The change in query plan for the following query:
        
          select * from t2 where (b,a) in (select a,b from t1 where a=3);
        
        where the table t2 is defined as:
        
          create table t2 as select * from t1;
        
        is caused by that when the optimizer optimizes the subquery it will
        try to optimize the conditions (in optimize_cond()). One of the things
        it does it to call build_equal_items_for_cond() for the following
        condition:
        
          (<cache>(`test`.`t2`.`b`) = `test`.`t1`.`a`)
        
        When this is "compiled" in build_equal_items_for_cond() both the
        <cache>(`test`.`t2`.`b`) and `test`.`t1`.`a` are stored in Item_ref
        objects. So with the existing code the real items does not get
        "compiled" and are not considered for alternatives. So the result of
        the "compilation" is:
        
          (<cache>(`test`.`t2`.`b`) = `test`.`t1`.`a`)
        
        With the patch applied both of the real items get evaluated and the
        code is able to find that `test`.`t1`.`a` can be replaced by the
        constant "3". So the result is from the "compilation" of this condition is
        
          <cache>(`test`.`t2`.`b`) = 3
        
        Which causes the execution plan for the subquery to change as it has.
     @ mysql-test/t/join.test
        Test case for Bug#59696 Optimizer fails to move WHERE condition
        on JOIN column when joining with a view.
     @ sql/item.cc
        Implement transform() and compile() methods for the Item_ref class.
        These methods will then do transform/compile for both the
        Item_ref object and on the object it has a reference to.
     @ sql/item.h
        Implement transform() and compile() methods for the Item_ref class.
[31 Jan 2011 17:53] Igor Babaev
If you add more rows to t2:
  INSERT INTO t2 VALUES (15,5),(18,8),(19,3),(11,4),(13,5),(12,6),(16,8);
and run
  ANALYZE TABLE t1,t2
then you'll see the same problem for 5.1.56 with your test case query.
[9 Feb 2011 12:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130859

3625 Olav Sandstaa	2011-02-09
      Fix for Bug#59696 Optimizer fails to move WHERE condition on JOIN column 
                        when joining with a view
      
      When optimizing a JOIN of two tables the optimizer tries to move WHERE
      conditions on the join column(s) to the first table in the JOIN in
      order to reduce the number of records it has to read. If the second
      operand of the JOIN is a view that is identical to the underlying
      table the optimizer should be able to perform the same optimization.
      In this case it fails to move the WHERE condition on the join column
      to the first table.
      
      The cause for this is that when the second operand is a view then
      the conditions contain Item_ref objects that reference the real item
      field objects. The existing code fails to evaluate these since it
      only evaluates the Item_ref object and fails to look at the real
      objects. As a result it fails to detect that the equals predicate on
      the second join operand can be substituted by a corresponding equals
      predicate on the first operand.
      
      There are two places where we fail to handle Item_ref correctly and that
      causes the optimizer to fail to do this optimization:
      
      1. In optimize_cond() we call build_equal_items() which again calls
      build_equal_items_for_cond() where we in Item_func::compile() fail to
      evaluate the Item::equal_fields_propagator() for the condition on the
      view operand. As a result the field used in the condition for the view
      is not considered as an equal field to the join column of the first
      table when this later is evaluated in make_join_statistics().
      
      2. Later in JOIN::optimize() we call substitute_for_best_equal_field()
      which uses Item_func::transform() to select the best condition of the
      tables. Item_func::transform() fails to evaluate the different
      conditions that are equal due to these being stored in a Item_ref
      object.
      
      The fix for these problems is to implement compile() and transform()
      functions for the Item_ref class that compile/transform both the
      Item_ref object itself and the item object referenced by the 'ref'
      pointer. So when Item_func::compile() and Item_func::transform() 
      evaluate Item_ref object this fix ensures that both the Item_ref 
      object and the object it references are evaluated.
     @ mysql-test/r/join.result
        Test case for Bug#59696 Optimizer fails to move WHERE condition
        on JOIN column when joining with a view.
     @ mysql-test/r/subquery_sj_none.result
        Change in explain/query plan due to the fix for Bug#59696 "Optimizer
        fails to move WHERE condition on JOIN column when joining with a view"
        
        The change in query plan for the following query:
        
          select * from t2 where (b,a) in (select a,b from t1 where a=3);
        
        where the table t2 is defined as:
        
          create table t2 as select * from t1;
        
        is caused by that when the optimizer optimizes the subquery it will
        try to optimize the conditions (in optimize_cond()). One of the things
        it does it to call build_equal_items_for_cond() for the following
        condition:
        
          (<cache>(`test`.`t2`.`b`) = `test`.`t1`.`a`)
        
        When this is "compiled" in build_equal_items_for_cond() both the
        <cache>(`test`.`t2`.`b`) and `test`.`t1`.`a` are stored in Item_ref
        objects. So with the existing code the real items does not get
        "compiled" and are not considered for alternatives. So the result of
        the "compilation" is:
        
          (<cache>(`test`.`t2`.`b`) = `test`.`t1`.`a`)
        
        With the patch applied both of the real items get evaluated and the
        code is able to find that `test`.`t1`.`a` can be replaced by the
        constant "3". So the result from the "compilation" of this condition is
        
          <cache>(`test`.`t2`.`b`) = 3
        
        Which causes the execution plan for the subquery to change as it has.
     @ mysql-test/t/join.test
        Test case for Bug#59696 Optimizer fails to move WHERE condition
        on JOIN column when joining with a view.
     @ sql/item.cc
        Implement transform() and compile() member functions for the Item_ref
        class. These functionss will then do transform/compile for both the
        Item_ref object and the object it has a reference to.
     @ sql/item.h
        Implement transform() and compile() member functions for the
        Item_ref class.
[30 Sep 2011 18:22] Paul DuBois
Noted in 5.6.4 changelog.

The optimizer sometimes failed to associate a WHERE clause with the
first table in a join when it was possible to do so, resulting in a
less efficient query.