Bug #50019 | Wrong result for IN-subquery with materialization | ||
---|---|---|---|
Submitted: | 1 Jan 2010 12:39 | Modified: | 16 Jun 2010 11:22 |
Reporter: | Øystein Grøvlen | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 6.0 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | materialization, optimizer_switch, semijoin, subquery |
[1 Jan 2010 12:39]
Øystein Grøvlen
[1 Jan 2010 13:29]
Valeriy Kravchuk
Verified just as described with recent 6.0.14 from bzr: 77-52-7-73:6.0-codebase openxs$ bin/mysql -uroot test Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 6.0.14-alpha-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create table t1(i int); Query OK, 0 rows affected (0.05 sec) mysql> insert into t1 values (1), (2), (3), (4), (5), (6), (7), (8), (9), -> (10); Query OK, 10 rows affected (0.01 sec) Records: 10 Duplicates: 0 Warnings: 0 mysql> create table t2(i int); Query OK, 0 rows affected (0.08 sec) mysql> insert into t2 values (1), (2), (3), (4), (5), (6), (7), (8), (9), -> (10); Query OK, 10 rows affected (0.00 sec) Records: 10 Duplicates: 0 Warnings: 0 mysql> create table t3(i int); Query OK, 0 rows affected (0.09 sec) mysql> insert into t3 values (1), (2), (3), (4), (5), (6), (7), (8), (9), -> (10); Query OK, 10 rows affected (0.01 sec) Records: 10 Duplicates: 0 Warnings: 0 mysql> select * from t1 where t1.i in (select t2.i from t2 join t3 where t2.i -> + t3.i = 5); +------+ | i | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | +------+ 10 rows in set (0.00 sec) mysql> set session optimizer_switch='materialization=off'; Query OK, 0 rows affected (0.00 sec) mysql> select * from t1 where t1.i in (select t2.i from t2 join t3 where t2.i -> + t3.i = 5); +------+ | i | +------+ | 1 | | 2 | | 3 | | 4 | +------+ 4 rows in set (0.00 sec)
[7 Mar 2010 16:22]
Sergey Petrunya
Analysis -------- # Disable join buffering to make things simpler MariaDB [j2]> set join_cache_level=0; Query OK, 0 rows affected (0.00 sec) MariaDB [j2]> explain select * from t1 where t1.i in (select t2.i from t2 join t3 where t2.i + t3.i = 5); +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | PRIMARY | t2 | ALL | NULL | NULL | NULL | NULL | 10 | Start materialize | | 1 | PRIMARY | t3 | ALL | NULL | NULL | NULL | NULL | 10 | Using where; End materialize | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ 3 rows in set (0.00 sec) ... (gdb) wher #0 Item_func_plus::int_op (this=0xaf766b0) at item_func.cc:1124 #1 0x081ebfaf in Item_func_numhybrid::val_int (this=0xaf766b0) at item_func.cc:850 #2 0x081fafe8 in Arg_comparator::compare_int_signed (this=0xaf7680c) at item_cmpfunc.cc:1258 #3 0x0820a617 in Arg_comparator::compare (this=0xaf7680c) at item_cmpfunc.h:71 #4 0x081ff499 in Item_func_eq::val_int (this=0xaf76790) at item_cmpfunc.cc:1719 #5 0x08315658 in evaluate_join_record (join=0xafa4b78, join_tab=0xafaf5d8, error=0) at sql_select.cc:12736 #6 0x08315bdf in sub_select (join=0xafa4b78, join_tab=0xafaf5d8, end_of_records=false) at sql_select.cc:12652 #7 0x083158eb in evaluate_join_record (join=0xafa4b78, join_tab=0xafaf428, error=0) at sql_select.cc:12832 #8 0x08315bdf in sub_select (join=0xafa4b78, join_tab=0xafaf428, end_of_records=false) at sql_select.cc:12652 #9 0x08315efc in sub_select_sjm (join=0xafa4b78, join_tab=0xafaf428, end_of_records=false) at sql_select.cc:12341 #10 0x083158eb in evaluate_join_record (join=0xafa4b78, join_tab=0xafaf278, error=0) at sql_select.cc:12832 #11 0x08315bdf in sub_select (join=0xafa4b78, join_tab=0xafaf278, end_of_records=false) at sql_select.cc:12652 #12 0x083250d0 in do_select (join=0xafa4b78, fields=0xa6cdf10, table=0x0, procedure=0x0) at sql_select.cc:12202 #13 0x08336789 in JOIN::exec (this=0xafa4b78) at sql_select.cc:2124 #14 0x08331088 in mysql_select (thd=0xa6cc590, rref_pointer_array=0xa6cdf80, tables=0xaf75740, wild_num=1, fields=@0xa6cdf10, conds=0xaf76888, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147764736, result=0xaf76a00, unit=0xa6cdbf4, select_lex=0xa6cde7c) at sql_select.cc:2319 ... (gdb) p *args[0]->field->table_name $152 = 0xaf11fb0 "t1" (gdb) p args[0]->field->field_name $153 = 0xaf81189 "i" # We're accessing "t1.i" while doing materialization (of t2 JOIN t3 ON ..) # This is wrong.
[7 Mar 2010 16:28]
Sergey Petrunya
It could be equality propagation. Indeed: (gdb) list 4760 const_item->cmp_context != cmp_context) 4761 return this; 4762 return const_item; 4763 } 4764 Item_field *subst= item_equal->get_first(); 4765 if (subst && field->table != subst->field->table && !field->eq(subst->field)) 4766 return subst; 4767 } 4768 return this; 4769 } (gdb) wher #0 Item_field::replace_equal_field (this=0xaf76570, arg=0x0) at item.cc:4765 #1 0x081be383 in Item::transform (this=0xaf76570, transformer=&virtual Item::replace_equal_field(unsigned char*), arg=0x0) at item.cc:549 #2 0x081f5e88 in Item_func::transform (this=0xaf766b0, transformer=&virtual table offset 332, argument=0x0) at item_func.cc:300 #3 0x081f5e88 in Item_func::transform (this=0xaf76790, transformer=&virtual table offset 332, argument=0x0) at item_func.cc:300 #4 0x08323266 in substitute_for_best_equal_field (cond=0xaf76790, cond_equal=0xaf771dc, table_join_idx=0xaf77388) at sql_select.cc:9124 #5 0x08323025 in substitute_for_best_equal_field (cond=0xaf77148, cond_equal=0xaf771dc, table_join_idx=0xaf77388) at sql_select.cc:9087 #6 0x0832e3ce in JOIN::optimize (this=0xafa4b78) at sql_select.cc:946 #7 0x08330ffe in mysql_select (thd=0xa6cc590, rref_pointer_array=0xa6cdf80, tables=0xaf75740, wild_num=1, fields=@0xa6cdf10, conds=0xaf76888, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147764736, result=0xaf76a00, unit=0xa6cdbf4, select_lex=0xa6cde7c) at sql_select.cc:2305 (gdb) p this->field_name $156 = 0xaf1fb41 "i" (gdb) p this->table_name $157 = 0xada3020 "t2" (gdb) (gdb) p subst->field_name $158 = 0xaf81189 "i" (gdb) p subst->table_name $159 = 0xaf11fb0 "t1" It should not do such replaces. In fact, the issue of equality propagation with Materialization was addressed in eliminate_item_equal(), see comment starting with: item_field might refer to a table that is within a semi-join The bug is that that logic is applied only in eliminate_item_equal(). eliminate_item_equal takes care of equal-field-substitution for equalities, while Item_field::replace_equal_field() takes care of equal-field-substitution elsewhere (this is why one needs to write "t2.i + t3.i =5" to observe this bug, simple equalities work correctly, while complex expressions don't).
[9 Mar 2010 10:31]
Sergey Petrunya
See also BUG#45174.
[13 Mar 2010 20:39]
Sergey Petrunya
http://askmonty.org/wiki/EqualityPropagationAndEqualityPropagationAndSemiJoinMaterializati... has a write-up about the nature of the problem and how it should be fixed.
[16 Jun 2010 11:22]
Jørgen Løland
Should be fixed as part of BUG#45174; comments for how to do that sent on email to commit list. Closing as duplicate.
[29 Sep 2010 14:35]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/119437 3253 Roy Lyseng 2010-09-29 Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. When a subquery is subject to a semijoin optimization, it's tables are merged to the outer query and later are treated as regular tables. This allows a bunch of optimizations to be applied, equality propagation is among them. Equality propagation is done after query execution plan is chosen. It substitutes fields from tables being retrieved later for fields from tables being retrieved earlier. However, it can't be applied as is to any semijoin table. The semijoin materialization strategy differs from other semijoin strategies that the data from materialized semijoin tables isn't used directly but saved to a temporary table first. The materialization isn't isolated in a separate step, it is done inline within the nested loop execution. When it comes to fetching rows from the first table in the block of materialized semijoin tables, sub_select() function is called to materialize the result of the subquery and save it in the materialized table. Later, data from the materialized table is used as they were regular table rows. Due to this we can't substitute fields that belong to the semi-join for fields from outer query and vice versa. Example: suppose we have a join order: ot1 ot2 SJ-Mat(it1 it2 it3) ot3 and equality ot2.col = it1.col = it2.col If we're looking for best substitute for 'it2.col', we should pick it1.col and not ot2.col. For a field that is not in a materialized semijoin we must pick a field that's not embedded in any materialized semijoin. Example: suppose we have a join order: SJ-Mat(it1 it2) ot1 ot2 and equality ot2.col = ot1.col = it2.col If we're looking for best substitute for 'ot2.col', we should pick ot1.col and not it2.col, because when we run a join between ot1 and ot2, execution of SJ-Mat(...) has already finished and we can't rely on the value of it*.*. Now we have added another Item_equal::get_first function that accepts as a parameter a field being substituted and checks whether it belongs to a materialized semijoin. The field to substitute will be from the same materialized semijoin nest (if supplied field is within such nest), or outside any materialized semijoin nest (if supplied field is outside such nest). The new checks rely on the first_sj_inner_tab and first_sj_inner_tab fields of the join-tab. These fields are therefore set as soon as possible after the join strategy is fixed. mysql-test/include/subquery_sj.inc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Added new tests. mysql-test/r/subquery_mat_all.result mysql-test/r/subquery_sj_all.result mysql-test/r/subquery_sj_all_jcl6.result mysql-test/r/subquery_sj_all_jcl7.result mysql-test/r/subquery_sj_dupsweed.result mysql-test/r/subquery_sj_dupsweed_jcl6.result mysql-test/r/subquery_sj_dupsweed_jcl7.result mysql-test/r/subquery_sj_firstmatch.result mysql-test/r/subquery_sj_firstmatch_jcl6.result mysql-test/r/subquery_sj_firstmatch_jcl7.result mysql-test/r/subquery_sj_loosescan.result mysql-test/r/subquery_sj_loosescan_jcl6.result mysql-test/r/subquery_sj_loosescan_jcl7.result mysql-test/r/subquery_sj_mat.result mysql-test/r/subquery_sj_mat_jcl6.result mysql-test/r/subquery_sj_mat_jcl7.result mysql-test/r/subquery_sj_mat_nosj.result mysql-test/r/subquery_sj_none.result mysql-test/r/subquery_sj_none_jcl6.result mysql-test/r/subquery_sj_none_jcl7.result Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Results for two new tests added. Some tests using semijoin materialization show that where clause has moved from the outer query into the materialized inner query. This is caused by the changed call to get_first() in eliminate_item_equal(). Ex: select * from ot where a in(select b from it where b>0); The clause "b>0" is now evaluated on the inner query materialization. Performance-wise this is never worse when using MaterializeScan and usually better for MaterializeLookup. For the latter strategy, the best possible solution is probably to evaluate the clause in both queries, this can be subject for a later feature development. sql/item.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Calling new get_first() function instead of old. sql/item_cmpfunc.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_first that accepts as parameter a field being substituted. sql/item_cmpfunc.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_first that accepts as parameter a field being substituted. sql/sql_select.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Setting fields first_sj_inner_tab and last_sj_inner_tab moved from setup_semijoin_dups_elimination() to get_best_combination(), so they are set as early as possible after join order optimization.
[7 Oct 2010 15:20]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/120276 3260 Roy Lyseng 2010-10-07 Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan When a subquery is subject to a semijoin optimization, it's tables are merged to the outer query and later are treated as regular tables. This allows a bunch of optimizations to be applied, equality propagation is among them. Equality propagation is done after query execution plan is chosen. It substitutes fields from tables being retrieved later for fields from tables being retrieved earlier. However, it can't be applied as is to any semijoin table. The semijoin materialization strategy differs from other semijoin strategies that the data from materialized semijoin tables isn't used directly but saved to a temporary table first. The materialization isn't isolated in a separate step, it is done inline within the nested loop execution. When it comes to fetching rows from the first table in the block of materialized semijoin tables, sub_select() function is called to materialize the result of the subquery and save it in the materialized table. Later, data from the materialized table is used as they were regular table rows. Due to this we can't substitute fields that belong to the semi-join for fields from outer query and vice versa. Example: suppose we have a join order: ot1 ot2 SJ-Mat(it1 it2 it3) ot3 and equality ot2.col = it1.col = it2.col If we're looking for best substitute for 'it2.col', we should pick it1.col and not ot2.col. For a field that is not in a materialized semijoin we can use any field, even those that are embedded in a materialized semijoin. This is because such fields are "copied back" to their original join-tab structures when the materialized temporary table is being read. Now we have added another Item_equal::get_first() function that accepts as a parameter a field being substituted and checks whether it belongs to a materialized semijoin. The field to substitute will be from the same materialized semijoin nest (if supplied field is within such nest), otherwise it will be the first field in the multiple equality. The new checks rely on the first_sj_inner_tab and first_sj_inner_tab fields of the join-tab. These fields are therefore set as soon as possible after the join strategy is fixed. Also fixed problem appearing in Bug#52068: When MaterializeScan semijoin strategy was used and there were one or more outer dependent tables before the semijoin tables, the scan over the materialized table was not properly reset for each row of the prefix outer tables. Also fixed problems with pushdown of SJM-aware predicates during make_join_select(): wrong predicates were sometimes generated, make_cond_after_sjm() was called at the wrong position, and make_cond_after_sjm() was never actually considering the pushed-down SJM predicates. mysql-test/include/subquery_sj.inc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan Added new tests. mysql-test/r/subquery_mat_all.result mysql-test/r/subquery_sj_all.result mysql-test/r/subquery_sj_all_jcl6.result mysql-test/r/subquery_sj_all_jcl7.result mysql-test/r/subquery_sj_dupsweed.result mysql-test/r/subquery_sj_dupsweed_jcl6.result mysql-test/r/subquery_sj_dupsweed_jcl7.result mysql-test/r/subquery_sj_firstmatch.result mysql-test/r/subquery_sj_firstmatch_jcl6.result mysql-test/r/subquery_sj_firstmatch_jcl7.result mysql-test/r/subquery_sj_loosescan.result mysql-test/r/subquery_sj_loosescan_jcl6.result mysql-test/r/subquery_sj_loosescan_jcl7.result mysql-test/r/subquery_sj_mat.result mysql-test/r/subquery_sj_mat_jcl6.result mysql-test/r/subquery_sj_mat_jcl7.result mysql-test/r/subquery_sj_mat_nosj.result mysql-test/r/subquery_sj_none.result mysql-test/r/subquery_sj_none_jcl6.result mysql-test/r/subquery_sj_none_jcl7.result Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan Results for three new tests added. Some tests using semijoin materialization show that where clause has moved from the outer query into the materialized inner query. This is caused by the changed call to get_first() in eliminate_item_equal(). Ex: select * from ot where a in(select b from it where b>0); The clause "b>0" is now evaluated on the inner query materialization. Performance-wise this is never worse when using MaterializeScan and usually better for MaterializeLookup. For the latter strategy, the best possible solution is probably to evaluate the clause in both queries, this can be subject for a later feature development. Another test that applies the same condition to both the outer and the inner query is added, to show the plan for such types of queries. Earlier wrong semijoin materialization test results are corrected. sql/item.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Calling new get_first() function instead of old. sql/item_cmpfunc.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_first() that accepts as argument a field being substituted. sql/item_cmpfunc.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_first() that accepts as argument a field being substituted. sql/sql_select.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan Setting fields first_sj_inner_tab and last_sj_inner_tab moved from setup_semijoin_dups_elimination() to get_best_combination(), so they are set as early as possible after join order optimization. In make_join_select(), the test that determined when to pushdown SJM-specific predicates was wrong, in addition to approving the comments. The logic of eliminate_item_equal() has been simplified and adjusted so that it generates equalities that are useful also when the semijoin materialization strategy is being used. Some simplification was possible by taking advantage of the new Item_equal::get_first() function. In sub_select_sjm(), moved code that initializes the scan over the materialized table so that it is now performed for each scan of table, instead of only for the first scan. In make_cond_for_table_from_pred(), a number of comments has been added, and TAB characters are replaced by spaces. In make_cond_after_sjm(), make sure that it handles equalities generated for semijoin materialization (with marker=3). Otherwise, removed marker optimizations for this function, as it will only be called once per materialized semijoin nest in a query. Added comments and removed TAB characters.
[22 Oct 2010 7:59]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/121634 3265 Roy Lyseng 2010-10-22 Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan When a subquery is subject to a semijoin optimization, its tables are merged to the outer query and later are treated as regular tables. This allows a bunch of optimizations to be applied, equality propagation is among them. Equality propagation is done after query execution plan is chosen. It substitutes fields from tables being retrieved later for fields from tables being retrieved earlier. However, it can't be applied as is to any semijoin table. The semijoin materialization strategy differs from other semijoin strategies that the data from materialized semijoin tables isn't used directly but saved to a temporary table first. The materialization isn't isolated in a separate step, it is done inline within the nested loop execution. When it comes to fetching rows from the first table in the block of materialized semijoin tables, sub_select() function is called to materialize the result of the subquery and save it in the materialized table. Later, data from the materialized table is used as if they were regular table rows. Due to this we can't substitute fields that belong to the semi-join for fields from outer query and vice versa. Example: suppose we have a join order: ot1 ot2 SJ-Mat(it1 it2 it3) ot3 and equality ot2.col = it1.col = it2.col If we're looking for best substitute for 'it2.col', we should pick it1.col and not ot2.col. For a field that is not in a materialized semijoin we can use any field, even those that are embedded in a materialized semijoin. This is because such fields are "copied back" to their original join-tab structures when the materialized temporary table is being read. Now we have added another Item_equal::get_first() function that accepts as a parameter a field being substituted and checks whether it belongs to a materialized semijoin. The field to substitute will be from the same materialized semijoin nest (if supplied field is within such nest), otherwise it will be the first field in the multiple equality. The new checks rely on the first_sj_inner_tab and first_sj_inner_tab fields of the join-tab. These fields are therefore set as soon as possible after the join strategy is fixed. Also fixed problem appearing in Bug#52068: When MaterializeScan semijoin strategy was used and there were one or more outer dependent tables before the semijoin tables, the scan over the materialized table was not properly reset for each row of the prefix outer tables. Also fixed problems with pushdown of SJM-aware predicates during make_join_select(): wrong predicates were sometimes generated, make_cond_after_sjm() was called at the wrong position, and make_cond_after_sjm() was never actually considering the pushed-down SJM predicates. mysql-test/include/subquery_sj.inc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan Added new tests. mysql-test/r/subquery_mat_all.result mysql-test/r/subquery_sj_all.result mysql-test/r/subquery_sj_all_jcl6.result mysql-test/r/subquery_sj_all_jcl7.result mysql-test/r/subquery_sj_dupsweed.result mysql-test/r/subquery_sj_dupsweed_jcl6.result mysql-test/r/subquery_sj_dupsweed_jcl7.result mysql-test/r/subquery_sj_firstmatch.result mysql-test/r/subquery_sj_firstmatch_jcl6.result mysql-test/r/subquery_sj_firstmatch_jcl7.result mysql-test/r/subquery_sj_loosescan.result mysql-test/r/subquery_sj_loosescan_jcl6.result mysql-test/r/subquery_sj_loosescan_jcl7.result mysql-test/r/subquery_sj_mat.result mysql-test/r/subquery_sj_mat_jcl6.result mysql-test/r/subquery_sj_mat_jcl7.result mysql-test/r/subquery_sj_mat_nosj.result mysql-test/r/subquery_sj_none.result mysql-test/r/subquery_sj_none_jcl6.result mysql-test/r/subquery_sj_none_jcl7.result Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan Results for three new tests added. Some tests using semijoin materialization show that where clause has moved from the outer query into the materialized inner query. This is caused by the changed call to get_first() in eliminate_item_equal(). Ex: select * from ot where a in(select b from it where b>0); The clause "b>0" is now evaluated on the inner query materialization. Performance-wise this is never worse when using MaterializeScan and usually better for MaterializeLookup. For the latter strategy, the best possible solution is probably to evaluate the clause in both queries, this can be subject for a later feature development. Another test that applies the same condition to both the outer and the inner query is added, to show the plan for such types of queries. Earlier wrong semijoin materialization test results are corrected. sql/item.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Calling new get_first() function instead of old. sql/item_cmpfunc.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_first() that accepts as argument a field being substituted. sql/item_cmpfunc.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_first() that accepts as argument a field being substituted. sql/sql_select.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan Setting fields first_sj_inner_tab and last_sj_inner_tab moved from setup_semijoin_dups_elimination() to get_best_combination(), so they are set as early as possible after join order optimization. In make_join_select(), the test that determined when to pushdown SJM-specific predicates was wrong, in addition to improving the comments. The logic of eliminate_item_equal() has been simplified and adjusted so that it generates equalities that are useful also when the semijoin materialization strategy is being used. Some simplification was possible by taking advantage of the new Item_equal::get_first() function. In sub_select_sjm(), moved code that initializes the scan over the materialized table so that it is now performed for each scan of table, instead of only for the first scan. In make_cond_for_table_from_pred(), a number of comments has been added, and TAB characters are replaced by spaces. In make_cond_after_sjm(), make sure that it handles equalities generated for semijoin materialization (with marker=3). Added comments and removed TAB characters.
[25 Oct 2010 9:29]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/121779 3269 Roy Lyseng 2010-10-25 Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. When a subquery is subject to a semijoin transformation, its tables are merged to the outer query and are later treated as regular tables. One possible optimization to apply after semijoin transformation is equality propagation. Equality propagation is done after query execution plan is chosen. It substitutes fields from tables being retrieved later for fields from tables being retrieved earlier. However, equality propagation can't be applied as is when using the semijoin materialization strategy. In this case, data from materialized semijoin tables are saved to a temporary table first before being involved in the join. This has two consequences for equality propagation: 1. Equalities that exist within the materialized tables must be processed within the materialization operation, and 2. Equalities between the materialized table and the outer query must be processed like regular equalities. Example: suppose we have a join order: ot1 ot2 SJ-Mat(it1 it2 it3) ot3 and equality ot2.col = it1.col = it2.col If we're looking for best substitute for it2.col, we should pick it1.col and not ot2.col, because the equality it1.col=it2.col can be processed during the materialization. For a field that is not in a materialized semijoin we can use any field, even those that are embedded in a materialized semijoin. This is because such fields are "copied back" to their original join-tab structures when the materialized temporary table is being read. Now we have added a bew function Item_equal::get_subst_item() that accepts as a parameter a field being substituted and checks whether it belongs to a materialized semijoin. The field to substitute will be from the same materialized semijoin nest (if supplied field is within such nest), otherwise it will be the first field in the multiple equality. The new checks rely on the first_sj_inner_tab and first_sj_inner_tab fields of the join-tab. These fields are therefore set as soon as possible after the join strategy is fixed (before they were only used by the DuplicateWeedout strategy, and were not needed until later). Also fixed problems with pushdown of SJM-aware predicates during make_join_select(): - Wrong predicates were sometimes generated, - make_cond_after_sjm() was called at the wrong position - make_cond_after_sjm() was never actually considering the pushed-down SJM predicates. mysql-test/include/subquery_sj.inc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. mysql-test/r/subquery_mat_all.result mysql-test/r/subquery_sj_all.result mysql-test/r/subquery_sj_all_jcl6.result mysql-test/r/subquery_sj_all_jcl7.result mysql-test/r/subquery_sj_dupsweed.result mysql-test/r/subquery_sj_dupsweed_jcl6.result mysql-test/r/subquery_sj_dupsweed_jcl7.result mysql-test/r/subquery_sj_firstmatch.result mysql-test/r/subquery_sj_firstmatch_jcl6.result mysql-test/r/subquery_sj_firstmatch_jcl7.result mysql-test/r/subquery_sj_loosescan.result mysql-test/r/subquery_sj_loosescan_jcl6.result mysql-test/r/subquery_sj_loosescan_jcl7.result mysql-test/r/subquery_sj_mat.result mysql-test/r/subquery_sj_mat_jcl6.result mysql-test/r/subquery_sj_mat_jcl7.result mysql-test/r/subquery_sj_mat_nosj.result mysql-test/r/subquery_sj_none.result mysql-test/r/subquery_sj_none_jcl6.result mysql-test/r/subquery_sj_none_jcl7.result Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Results for three new tests added. Some tests using semijoin materialization show that where clause has moved from the outer query into the materialized inner query. This is caused by the changed call to get_subst_item() in eliminate_item_equal(). Ex: select * from ot where a in(select b from it where b>0); The clause "b>0" is now evaluated on the inner query materialization. Performance-wise this is never worse than before when using MaterializeScan and usually better than before for MaterializeLookup. (For MaterializeLookup, it is worse when it is more efficient to move the clause to the outer query. The best possible solution for this case is probably to evaluate the clause in both queries, this can be subject for a later feature development.) Another test that applies the same condition to both the outer and the inner query is added, to show the plan for such types of queries. Earlier wrong semijoin materialization test results are corrected. sql/item.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Calling new get_subst_item() function instead of get_first(). sql/item_cmpfunc.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_subst_item() that accepts as argument a field being substituted. sql/item_cmpfunc.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_subst_item() that accepts as argument a field being substituted. sql/sql_select.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Setting fields first_sj_inner_tab and last_sj_inner_tab moved from setup_semijoin_dups_elimination() to get_best_combination(), so they are set as early as possible after join order optimization. In make_join_select(), the test that determined when to pushdown SJM-specific predicates was wrong, in addition to improving the comments. The logic of eliminate_item_equal() has been simplified and adjusted so that it generates equalities that are useful also when the semijoin materialization strategy is being used. Some simplification was possible by taking advantage of the new Item_equal::get_subst_item() function. In make_cond_for_table_from_pred(), a number of comments has been added, and TAB characters are replaced by spaces. In make_cond_after_sjm(), make sure that it handles equalities generated for semijoin materialization (with marker=3). Added comments and removed TAB characters.
[26 Oct 2010 10:45]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/121911 3269 Roy Lyseng 2010-10-26 on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. When a subquery is subject to a semijoin transformation, its tables are merged to the outer query and are later treated as regular tables. One possible optimization to apply after semijoin transformation is equality propagation. Equality propagation is done after query execution plan is chosen. It substitutes fields from tables being retrieved later for fields from tables being retrieved earlier. However, equality propagation can't be applied as is when using the semijoin materialization strategy. In this case, data from materialized semijoin tables are saved to a temporary table first before being involved in the join. This has two consequences for equality propagation: 1. Equalities that exist within the materialized tables must be processed within the materialization operation, and 2. Equalities between the materialized table and the outer query must be processed like regular equalities. Example: suppose we have a join order: ot1 ot2 SJ-Mat(it1 it2 it3) ot3 and equality ot2.col = it1.col = it2.col If we're looking for best substitute for it2.col, we should pick it1.col and not ot2.col, because the equality it1.col=it2.col can be processed during the materialization. For a field that is not in a materialized semijoin we can use any field, even those that are embedded in a materialized semijoin. This is because such fields are "copied back" to their original join-tab structures when the materialized temporary table is being read. Now we have added a new function Item_equal::get_subst_item() that accepts as a parameter a field being substituted and checks whether it belongs to a materialized semijoin. The field to substitute will be from the same materialized semijoin nest (if supplied field is within such nest), otherwise it will be the first field in the multiple equality. The new checks rely on the first_sj_inner_tab and first_sj_inner_tab fields of the join-tab. These fields are therefore set as soon as possible after the join strategy is fixed (before they were only used by the DuplicateWeedout strategy, and were not needed until later). Also fixed problems with pushdown of SJM-aware predicates during make_join_select(): - Wrong predicates were sometimes generated, - make_cond_after_sjm() was called at the wrong position in the join sequence. - make_cond_after_sjm() was never actually considering the pushed-down SJM predicates. mysql-test/include/subquery_sj.inc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. mysql-test/r/subquery_mat_all.result mysql-test/r/subquery_sj_all.result mysql-test/r/subquery_sj_all_jcl6.result mysql-test/r/subquery_sj_all_jcl7.result mysql-test/r/subquery_sj_dupsweed.result mysql-test/r/subquery_sj_dupsweed_jcl6.result mysql-test/r/subquery_sj_dupsweed_jcl7.result mysql-test/r/subquery_sj_firstmatch.result mysql-test/r/subquery_sj_firstmatch_jcl6.result mysql-test/r/subquery_sj_firstmatch_jcl7.result mysql-test/r/subquery_sj_loosescan.result mysql-test/r/subquery_sj_loosescan_jcl6.result mysql-test/r/subquery_sj_loosescan_jcl7.result mysql-test/r/subquery_sj_mat.result mysql-test/r/subquery_sj_mat_jcl6.result mysql-test/r/subquery_sj_mat_jcl7.result mysql-test/r/subquery_sj_mat_nosj.result mysql-test/r/subquery_sj_none.result mysql-test/r/subquery_sj_none_jcl6.result mysql-test/r/subquery_sj_none_jcl7.result Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Results for three new tests added. Some tests using semijoin materialization show that where clause has moved from the outer query into the materialized inner query. This is caused by the changed call to get_subst_item() in eliminate_item_equal(). Ex: select * from ot where a in(select b from it where b>0); The clause "b>0" is now evaluated on the inner query materialization. Performance-wise this is never worse than before when using MaterializeScan and usually better than before for MaterializeLookup. (For MaterializeLookup, it is worse when it is more efficient to move the clause to the outer query. The best possible solution for this case is probably to evaluate the clause in both queries, this can be subject for a later feature development.) Another test that applies the same condition to both the outer and the inner query is added, to show the plan for such types of queries. Earlier wrong semijoin materialization test results are corrected. sql/item.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Calling new get_subst_item() function instead of get_first(). sql/item_cmpfunc.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_subst_item() that accepts as argument a field being substituted. sql/item_cmpfunc.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_subst_item() that accepts as argument a field being substituted. sql/sql_select.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Setting fields first_sj_inner_tab and last_sj_inner_tab moved from setup_semijoin_dups_elimination() to get_best_combination(), so they are set as early as possible after join order optimization. In make_join_select(), the test that determined when to pushdown SJM-specific predicates was wrong, in addition to improving the comments. The logic of eliminate_item_equal() has been simplified and adjusted so that it generates equalities that are useful also when the semijoin materialization strategy is being used. Some simplification was possible by taking advantage of the new Item_equal::get_subst_item() function. In make_cond_for_table_from_pred(), a number of comments has been added, and TAB characters are replaced by spaces. In make_cond_after_sjm(), make sure that it handles equalities generated for semijoin materialization (with marker=3). Added comments and removed TAB characters.
[13 Nov 2010 16:27]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (merge vers: 5.6.99-m5) (pib:21)
[13 Nov 2010 16:42]
Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:alexander.nozdrin@oracle.com-20101113152540-gxro4g0v29l27f5x) (pib:21)