Bug #59696 | Optimizer fails to move WHERE condition on JOIN column when joining with a view | ||
---|---|---|---|
Submitted: | 24 Jan 2011 14:13 | Modified: | 30 Sep 2011 18:22 |
Reporter: | Olav Sandstå | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.1.56,5.5.10, 5.6.2 | OS: | Any |
Assigned to: | Olav Sandstå | CPU Architecture: | Any |
[24 Jan 2011 14:13]
Olav Sandstå
[24 Jan 2011 14:16]
Olav Sandstå
This problem was discovered when working on Bug#59347. Note that until that issues is fixed the two queries in the "how to reproduce" section will produce different results. To avoid different results just add: set optimizer_switch='index_condition_pushdown=off'; to the test case.
[24 Jan 2011 14:19]
Valeriy Kravchuk
Verified also with current mysql-5.5: macbook-pro:5.5 openxs$ bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 66 Server version: 5.5.10-debug Source distribution Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE TABLE t1 ( -> col_int_key INTEGER NOT NULL -> ) ENGINE=InnoDB; Query OK, 0 rows affected (0.06 sec) mysql> INSERT INTO t1 VALUES (1),(2),(3),(5),(6),(8),(9); Query OK, 7 rows affected (0.00 sec) Records: 7 Duplicates: 0 Warnings: 0 mysql> CREATE TABLE t2 ( -> pk INTEGER NOT NULL, -> col_int_key INTEGER NOT NULL, -> PRIMARY KEY (pk) -> ) ENGINE=InnoDB; Query OK, 0 rows affected (0.10 sec) mysql> INSERT INTO t2 VALUES (5,5),(8,8),(9,3),(1,4),(3,5),(2,6),(6,8); Query OK, 7 rows affected (0.00 sec) Records: 7 Duplicates: 0 Warnings: 0 mysql> CREATE VIEW v_t2 AS SELECT * FROM t2; Query OK, 0 rows affected (0.09 sec) mysql> EXPLAIN -> SELECT t2.pk, t2.col_int_key FROM t2, t1 where t2.pk = t1.col_int_key and t2.pk >= 2; +----+-------------+-------+--------+---------------+---------+---------+---------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+---------------------+------+-------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 7 | Using where | | 1 | SIMPLE | t2 | eq_ref | PRIMARY | PRIMARY | 4 | test.t1.col_int_key | 1 | | +----+-------------+-------+--------+---------------+---------+---------+---------------------+------+-------------+ 2 rows in set (0.06 sec) mysql> EXPLAIN -> SELECT v_t2.pk, v_t2.col_int_key FROM v_t2, t1 where v_t2.pk = t1.col_int_key and v_t2.pk -> >= 2; +----+-------------+-------+--------+---------------+---------+---------+---------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+---------------------+------+-------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 7 | | | 1 | SIMPLE | t2 | eq_ref | PRIMARY | PRIMARY | 4 | test.t1.col_int_key | 1 | Using where | +----+-------------+-------+--------+---------------+---------+---------+---------------------+------+-------------+ 2 rows in set (0.00 sec)
[24 Jan 2011 14:27]
Valeriy Kravchuk
So, for now this is not only about performance, but also about wrong results...
[24 Jan 2011 14:38]
Valeriy Kravchuk
On 5.1 plans look the same in both cases: macbook-pro:5.1 openxs$ bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.1.56-debug Source distribution Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. This software comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to modify and redistribute it under the GPL v2 license Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> drop table t1, t2; ERROR 1051 (42S02): Unknown table 't2' mysql> CREATE TABLE t1 ( -> col_int_key INTEGER NOT NULL -> ) ENGINE=InnoDB; Query OK, 0 rows affected (0.12 sec) mysql> mysql> INSERT INTO t1 VALUES (1),(2),(3),(5),(6),(8),(9); Query OK, 7 rows affected (0.01 sec) Records: 7 Duplicates: 0 Warnings: 0 mysql> mysql> CREATE TABLE t2 ( -> pk INTEGER NOT NULL, -> col_int_key INTEGER NOT NULL, -> PRIMARY KEY (pk) -> ) ENGINE=InnoDB; Query OK, 0 rows affected (0.14 sec) mysql> mysql> INSERT INTO t2 VALUES (5,5),(8,8),(9,3),(1,4),(3,5),(2,6),(6,8); Query OK, 7 rows affected (0.01 sec) Records: 7 Duplicates: 0 Warnings: 0 mysql> mysql> CREATE VIEW v_t2 AS SELECT * FROM t2; Query OK, 0 rows affected (0.20 sec) mysql> mysql> EXPLAIN -> SELECT t2.pk, t2.col_int_key FROM t2, t1 where t2.pk = t1.col_int_key and t2.pk >= 2; +----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------------+ | 1 | SIMPLE | t2 | range | PRIMARY | PRIMARY | 4 | NULL | 3 | Using where | | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 7 | Using where; Using join buffer | +----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------------+ 2 rows in set (0.09 sec) mysql> mysql> EXPLAIN -> SELECT v_t2.pk, v_t2.col_int_key FROM v_t2, t1 where v_t2.pk = t1.col_int_key and v_t2.pk -> >= 2; +----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------------+ | 1 | SIMPLE | t2 | range | PRIMARY | PRIMARY | 4 | NULL | 3 | Using where | | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 7 | Using where; Using join buffer | +----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------------+ 2 rows in set (0.00 sec)
[27 Jan 2011 15:51]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/129789 3559 Olav Sandstaa 2011-01-27 Fix for Bug#59696 Optimizer fails to move WHERE condition on JOIN column when joining with a view When optimizing a JOIN of two tables the optimizer tries to move WHERE conditions on the join column(s) to the first table in the JOIN in order to reduce the number of records it has to read. If the second operand of the JOIN is a view that is identical to the underlying table the optimizer should be able to perform the same optimization. In this case it fails to move the WHERE condition on the join column to the first table. The cause for this is that when the second operand is a view then the WHERE clauses is stored using Item_ref object that references the real item objects. The existing code fails to evaluate these since it only evaluates the Item_ref object and fails to look at the real objects. As a result it fails to detect that the WHERE clause on the second join operand can be substituted by a corresponding WHERE clause on the first operand. There are two places where we fail to handle Item_ref correctly and that causes the optimizer to fail to do this optimization: 1. In optimize_cond() we call build_equal_items() which again calls build_equal_items_for_cond() where we in Item_func::compile() fail to evaluate the Item::equal_fields_propagator() for the condition on the view operand. As a result the field used in the condition for the view is not considered as an equal field to the join column of the first table when this later is evaluated in make_join_statistics(). 2. Later in JOIN::optimize() we call substitute_for_best_equal_field() which uses Item_func::transform() to select the best condition of the tables. Item_func::transform() fails to evaluate the different conditions that are equal due to these being stored in a Item_ref object. The fix for these problems is to implement compile() and transform() methods for the Item_ref class that will do the proper evaluation of the compile() and transform on the objects the Item_ref object has the reference to. This will ensure that when Item_func::compile() and Item_func::transform() evaluates Item_ref objects both the Item_ref object and the object it references get evaluated. @ mysql-test/r/join.result Test case for Bug#59696 Optimizer fails to move WHERE condition on JOIN column when joining with a view. @ mysql-test/r/subquery_sj_none.result Change in explain/query plan due to the fix for Bug#59696 "Optimizer fails to move WHERE condition on JOIN column when joining with a view" The change in query plan for the following query: select * from t2 where (b,a) in (select a,b from t1 where a=3); where the table t2 is defined as: create table t2 as select * from t1; is caused by that when the optimizer optimizes the subquery it will try to optimize the conditions (in optimize_cond()). One of the things it does it to call build_equal_items_for_cond() for the following condition: (<cache>(`test`.`t2`.`b`) = `test`.`t1`.`a`) When this is "compiled" in build_equal_items_for_cond() both the <cache>(`test`.`t2`.`b`) and `test`.`t1`.`a` are stored in Item_ref objects. So with the existing code the real items does not get "compiled" and are not considered for alternatives. So the result of the "compilation" is: (<cache>(`test`.`t2`.`b`) = `test`.`t1`.`a`) With the patch applied both of the real items get evaluated and the code is able to find that `test`.`t1`.`a` can be replaced by the constant "3". So the result is from the "compilation" of this condition is <cache>(`test`.`t2`.`b`) = 3 Which causes the execution plan for the subquery to change as it has. @ mysql-test/t/join.test Test case for Bug#59696 Optimizer fails to move WHERE condition on JOIN column when joining with a view. @ sql/item.cc Implement transform() and compile() methods for the Item_ref class. These methods will then do transform/compile for both the Item_ref object and on the object it has a reference to. @ sql/item.h Implement transform() and compile() methods for the Item_ref class.
[31 Jan 2011 17:53]
Igor Babaev
If you add more rows to t2: INSERT INTO t2 VALUES (15,5),(18,8),(19,3),(11,4),(13,5),(12,6),(16,8); and run ANALYZE TABLE t1,t2 then you'll see the same problem for 5.1.56 with your test case query.
[9 Feb 2011 12:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/130859 3625 Olav Sandstaa 2011-02-09 Fix for Bug#59696 Optimizer fails to move WHERE condition on JOIN column when joining with a view When optimizing a JOIN of two tables the optimizer tries to move WHERE conditions on the join column(s) to the first table in the JOIN in order to reduce the number of records it has to read. If the second operand of the JOIN is a view that is identical to the underlying table the optimizer should be able to perform the same optimization. In this case it fails to move the WHERE condition on the join column to the first table. The cause for this is that when the second operand is a view then the conditions contain Item_ref objects that reference the real item field objects. The existing code fails to evaluate these since it only evaluates the Item_ref object and fails to look at the real objects. As a result it fails to detect that the equals predicate on the second join operand can be substituted by a corresponding equals predicate on the first operand. There are two places where we fail to handle Item_ref correctly and that causes the optimizer to fail to do this optimization: 1. In optimize_cond() we call build_equal_items() which again calls build_equal_items_for_cond() where we in Item_func::compile() fail to evaluate the Item::equal_fields_propagator() for the condition on the view operand. As a result the field used in the condition for the view is not considered as an equal field to the join column of the first table when this later is evaluated in make_join_statistics(). 2. Later in JOIN::optimize() we call substitute_for_best_equal_field() which uses Item_func::transform() to select the best condition of the tables. Item_func::transform() fails to evaluate the different conditions that are equal due to these being stored in a Item_ref object. The fix for these problems is to implement compile() and transform() functions for the Item_ref class that compile/transform both the Item_ref object itself and the item object referenced by the 'ref' pointer. So when Item_func::compile() and Item_func::transform() evaluate Item_ref object this fix ensures that both the Item_ref object and the object it references are evaluated. @ mysql-test/r/join.result Test case for Bug#59696 Optimizer fails to move WHERE condition on JOIN column when joining with a view. @ mysql-test/r/subquery_sj_none.result Change in explain/query plan due to the fix for Bug#59696 "Optimizer fails to move WHERE condition on JOIN column when joining with a view" The change in query plan for the following query: select * from t2 where (b,a) in (select a,b from t1 where a=3); where the table t2 is defined as: create table t2 as select * from t1; is caused by that when the optimizer optimizes the subquery it will try to optimize the conditions (in optimize_cond()). One of the things it does it to call build_equal_items_for_cond() for the following condition: (<cache>(`test`.`t2`.`b`) = `test`.`t1`.`a`) When this is "compiled" in build_equal_items_for_cond() both the <cache>(`test`.`t2`.`b`) and `test`.`t1`.`a` are stored in Item_ref objects. So with the existing code the real items does not get "compiled" and are not considered for alternatives. So the result of the "compilation" is: (<cache>(`test`.`t2`.`b`) = `test`.`t1`.`a`) With the patch applied both of the real items get evaluated and the code is able to find that `test`.`t1`.`a` can be replaced by the constant "3". So the result from the "compilation" of this condition is <cache>(`test`.`t2`.`b`) = 3 Which causes the execution plan for the subquery to change as it has. @ mysql-test/t/join.test Test case for Bug#59696 Optimizer fails to move WHERE condition on JOIN column when joining with a view. @ sql/item.cc Implement transform() and compile() member functions for the Item_ref class. These functionss will then do transform/compile for both the Item_ref object and the object it has a reference to. @ sql/item.h Implement transform() and compile() member functions for the Item_ref class.
[30 Sep 2011 18:22]
Paul DuBois
Noted in 5.6.4 changelog. The optimizer sometimes failed to associate a WHERE clause with the first table in a join when it was possible to do so, resulting in a less efficient query.