Bug #45174 | XOR in subqueries produces differing results in 5.1 and 5.4 | ||
---|---|---|---|
Submitted: | 28 May 2009 22:51 | Modified: | 22 Nov 2010 1:32 |
Reporter: | Patrick Crews | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 6.0 | OS: | Any |
Assigned to: | Roy Lyseng | CPU Architecture: | Any |
Tags: | materialization, optimizer_switch, outerjoin, semijoin, subquery, xor |
[28 May 2009 22:51]
Patrick Crews
[29 May 2009 0:29]
Patrick Crews
Forgot to add: Simpler version of the query that still fails: # 18:28:02 Simplified query: SELECT `date_nokey` FROM C WHERE `varchar_nokey` IN ( SELECT `varchar_key` FROM CC WHERE `int_nokey` XOR `varchar_key` ) Use the switch --validator=ResultsetComparatorSimplify to pare down failing queries to minimal form
[17 Jun 2009 8:20]
Patrick Crews
RQG-generated .test file: /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */; CREATE TABLE `CC` ( `int_nokey` int(11) NOT NULL, `time_key` time NOT NULL, `varchar_key` varchar(1) NOT NULL, `varchar_nokey` varchar(1) NOT NULL, KEY `time_key` (`time_key`), KEY `varchar_key` (`varchar_key`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; /*!40101 SET character_set_client = @saved_cs_client */; INSERT INTO `CC` VALUES (0,'00:00:00','q','q'),(5,'19:02:44','m','m'),(7,'00:00:00','j','j'),(1,'23:49:29','z','z'),(8,'02:51:11','a','a'),(2,'00:00:00','',''),(1,'00:00:00','e','e'),(8,'00:00:00','t','t'),(5,'00:00:00','q','q'),(4,'19:51:30','b','b'),(5,'00:00:00','w','w'),(3,'06:20:13','m','m'),(0,'14:35:22','x','x'),(8,'02:15:41','',''),(0,'07:21:20','w','w'),(4,'00:00:00','x','x'),(0,'00:00:00','e','e'),(0,'04:57:18','e','e'),(2,'00:00:00','p','p'),(0,'20:55:32','x','x'); /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */; CREATE TABLE `B` ( `int_nokey` int(11) NOT NULL, `time_key` time NOT NULL, `varchar_key` varchar(1) NOT NULL, `varchar_nokey` varchar(1) NOT NULL, KEY `time_key` (`time_key`), KEY `varchar_key` (`varchar_key`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; /*!40101 SET character_set_client = @saved_cs_client */; INSERT INTO `B` VALUES (5,'15:13:38','f','f'),(0,'00:05:48','x','x'); /* OPTIMIZER SETTINGS: */ SET SESSION optimizer_switch = 'firstmatch=off,index_merge=off,index_merge_union=off,index_merge_sort_union=off,index_merge_intersection=off,loosescan=off,materialization=off,semijoin=off'; SET SESSION optimizer_use_mrr = 'disable'; SET SESSION engine_condition_pushdown = '1'; SET SESSION join_cache_level = '1'; SET GLOBAL optimizer_switch = 'firstmatch=off,index_merge=off,index_merge_union=off,index_merge_sort_union=off,index_merge_intersection=off,loosescan=off,materialization=off,semijoin=off'; SET GLOBAL optimizer_use_mrr = 'disable'; SET GLOBAL engine_condition_pushdown = '1'; SET GLOBAL join_cache_level = '1'; /* ORIGINAL QUERY: SELECT OUTR . `time_key` AS X FROM B AS OUTR WHERE ( OUTR . `varchar_nokey` , OUTR . `varchar_key` ) IN ( SELECT DISTINCT INNR . `varchar_key` AS X , INNR . `varchar_nokey` AS Y FROM CC AS INNR2 LEFT JOIN CC AS INNR ON ( INNR2 . `varchar_nokey` < INNR . `varchar_nokey` ) WHERE INNR . `int_key` = INNR . `int_nokey` XOR OUTR . `varchar_key` < 's' ORDER BY INNR . `varchar_key` ) AND OUTR . `int_key` IS NOT NULL ORDER BY OUTR . `int_key` , OUTR . `pk`; ORIGINAL DIFF: --- /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen19289-server0.dump 2009-06-17 09:09:21.000000000 +0100 +++ /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen19289-server1.dump 2009-06-17 09:09:21.000000000 +0100 @@ -1 +0,0 @@ -00:05:48 SIMPLIFIED QUERY: SELECT `time_key` FROM B WHERE ( `varchar_nokey` , `varchar_key` ) IN ( SELECT `varchar_key` , `varchar_nokey` FROM CC WHERE `int_nokey` XOR 's' ) ; SIMPLIFIED DIFF: --- /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen19289-server0.dump 2009-06-17 09:09:21.000000000 +0100 +++ /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen19289-server1.dump 2009-06-17 09:09:21.000000000 +0100 @@ -1 +0,0 @@ -00:05:48 */
[12 Jul 2009 11:24]
Sergey Petrunya
Discussion/ initial investigation result: The problem seems to be in IndexConditionPushdown, not subquery optimizations.
[13 Jul 2009 16:03]
Evgeny Potemkin
By some reason the reported test case doesn't work for me. Here is the test which works. 5.4 returns 4 rows, while 5.1 result is empty. DROP TABLE IF EXISTS CC, C; CREATE TABLE `CC` ( `pk` int(11) NOT NULL AUTO_INCREMENT, `datetime_key` datetime NOT NULL, `varchar_key` varchar(1) NOT NULL, `varchar_nokey` varchar(1) NOT NULL, PRIMARY KEY (`pk`), KEY `datetime_key` (`datetime_key`), KEY `varchar_key` (`varchar_key`) ); INSERT INTO `CC` VALUES (11,'2002-09-23 00:00:00','m','m'),(12,'0000-00-00 00:00:00','j','j'),(13,'2006-06-07 00:00:00','z','z'),(14,'2000-09-16 12:15:34','a','a'),(15,'2007-08-05 15:47:52','',''),(16,'0000-00-00 00:00:00','e','e'),(17,'2005-12-02 19:34:26','t','t'),(19,'0000-00-00 00:00:00','b','b'),(20,'2007-12-28 00:00:00','w','w'),(21,'2004-08-02 11:48:43','m','m'),(23,'2004-04-19 12:18:43','',''),(24,'2009-04-27 00:00:00','w','w'),(26,'0000-00-00 00:00:00','e','e'),(27,'2002-03-22 11:48:37','e','e'),(28,'0000-00-00 00:00:00','p','p'); CREATE TABLE `C` ( `pk` int(11) NOT NULL AUTO_INCREMENT, `datetime_key` datetime NOT NULL, `varchar_key` varchar(1) NOT NULL, `varchar_nokey` varchar(1) NOT NULL, PRIMARY KEY (`pk`), KEY `datetime_key` (`datetime_key`), KEY `varchar_key` (`varchar_key`) ); INSERT INTO `C` VALUES (2,'2002-02-09 07:38:13','v','v'),(4,'0000-00-00 00:00:00','u','u'),(5,'2009-07-28 03:43:30','n','n'),(6,'2009-08-04 00:00:00','l','l'),(7,'0000-00-00 00:00:00','h','h'),(8,'0000-00-00 00:00:00','u','u'),(9,'2005-08-02 17:16:54','n','n'),(10,'2002-12-21 00:00:00','j','j'),(11,'2005-08-15 12:37:35','k','k'),(12,'0000-00-00 00:00:00','e','e'),(13,'2006-03-10 00:00:00','i','i'),(14,'2005-05-16 11:02:36','u','u'),(15,'2008-11-02 00:00:00','n','n'),(16,'2006-03-15 00:00:00','b','b'),(17,'0000-00-00 00:00:00','x','x'),(18,'0000-00-00 00:00:00','',''),(19,'2008-12-17 20:15:40','q','q'),(20,'0000-00-00 00:00:00','u','u'); SELECT varchar_nokey, datetime_key FROM C WHERE ( `varchar_nokey` , `varchar_nokey` ) IN ( SELECT `varchar_key` , `varchar_nokey` FROM CC WHERE `varchar_nokey` < 'n' XOR `pk` ) ; DROP TABLE CC, C;
[14 Jul 2009 22:17]
Patrick Crews
The values of optimizer_switch, optimizer_use_mrr, and engine_condition_pushdown did not affect the presence of this bug in my testing.
[17 Jul 2009 17:06]
Sergey Petrunya
Patrick, in my testing the value of @@optimizer_switch did have an effect: mysql> select version(); +-------------------+ | version() | +-------------------+ | 5.4.4-alpha-debug | +-------------------+ 1 row in set (0.00 sec) mysql> SELECT varchar_nokey FROM C WHERE ( `varchar_nokey` , `varchar_nokey` ) IN ( SELECT `varchar_key` , `varchar_nokey` FROM CC WHERE `varchar_nokey` < 'n' XOR `pk` ); +---------------+ | varchar_nokey | +---------------+ | j | | e | | b | | | +---------------+ 4 rows in set (0.07 sec) mysql> set optimizer_switch='semijoin=off'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT varchar_nokey FROM C WHERE ( `varchar_nokey` , `varchar_nokey` ) IN ( SELECT `varchar_key` , `varchar_nokey` FROM CC WHERE `varchar_nokey` < 'n' XOR `pk` ); Empty set (0.09 sec) mysql> set optimizer_switch='semijoin=off,materialization=off'; Query OK, 0 rows affected (0.00 sec) mysql> SELECT varchar_nokey FROM C WHERE ( `varchar_nokey` , `varchar_nokey` ) IN ( SELECT `varchar_key` , `varchar_nokey` FROM CC WHERE `varchar_nokey` < 'n' XOR `pk` ); Empty set (0.05 sec)
[17 Jul 2009 17:08]
Sergey Petrunya
mysql> set optimizer_switch='default,materialization=off'; Query OK, 0 rows affected (0.00 sec) mysql> explain SELECT varchar_nokey FROM C WHERE ( `varchar_nokey` , `varchar_nokey` ) IN ( SELECT `varchar_key` , `varchar_nokey` FROM CC WHERE `varchar_nokey` < 'n' XOR `pk` ); +----+-------------+-------+------+---------------+-------------+---------+----------------------------+------+---------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------------+---------+----------------------------+------+---------------------------------------------------+ | 1 | PRIMARY | C | ALL | NULL | NULL | NULL | NULL | 18 | | | 1 | PRIMARY | CC | ref | varchar_key | varchar_key | 3 | bug45174r2.C.varchar_nokey | 2 | Using index condition; Using where; FirstMatch(C) | +----+-------------+-------+------+---------------+-------------+---------+----------------------------+------+---------------------------------------------------+ 2 rows in set (0.01 sec) mysql> SELECT varchar_nokey FROM C WHERE ( `varchar_nokey` , `varchar_nokey` ) IN ( SELECT `varchar_key` , `varchar_nokey` FROM CC WHERE `varchar_nokey` < 'n' XOR `pk` ); Empty set (0.03 sec) mysql> set engine_condition_pushdown=off; Query OK, 0 rows affected (0.01 sec) mysql> SELECT varchar_nokey FROM C WHERE ( `varchar_nokey` , `varchar_nokey` ) IN ( SELECT `varchar_key` , `varchar_nokey` FROM CC WHERE `varchar_nokey` < 'n' XOR `pk` ); Empty set (0.02 sec)
[17 Jul 2009 17:54]
Sergey Petrunya
The above makes one to conclude that the problem shows up only when semi-join and materialization are used.
[29 Jul 2009 8:05]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/79501 2814 Evgeny Potemkin 2009-07-29 Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Equality propagation is done after query execution plan is chosen. It substitutes fields from tables being retrieved later for fields from tables being retrieved earlier. Materialized semi-joins are exception to this rule. For field which belongs to a table within a materialized semi-join, we can only pick fields from the same semi-join. Example: suppose we have a join order: ot1 ot2 SJ-Mat(it1 it2 it3) ot3 and equality ot2.col = it1.col = it2.col If we're looking for best substitute for 'it2.col', we should pick it1.col and not ot2.col. For a field that is not in a materialized semi-join we must pick a field that's not embedded in a materialized semi-join. Example: suppose we have a join order: SJ-Mat(it1 it2) ot1 ot2 and equality ot2.col = ot1.col = it2.col If we're looking for best substitute for 'ot2.col', we should pick ot1.col and not it2.col, because when we run a join between ot1 and ot2 execution of SJ-Mat(...) has already finished and we can't rely on the value of it*.*. Now the Item_equal::get_first function accepts as a parameter a field being substituted and checks whether it belongs to a materialized semi-join. Depending on the check result a field to substitute for or NULL is returned. The is_sj_materialization_strategy method is added to the JOIN_TAB class to check whether JOIN_TAB belongs to a materialized semi-join. @ mysql-test/r/subselect3.result A test case result adjusted after fixing bug#45174. @ mysql-test/r/subselect3_jcl6.result A test case result adjusted after fixing bug#45174. @ mysql-test/r/subselect_sj.result A test case added for the bug#45174. @ mysql-test/r/subselect_sj_jcl6.result A test case added for the bug#45174. @ mysql-test/t/subselect_sj.test A test case added for the bug#45174. @ sql/item.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Now the Item_equal::get_first function accepts as a parameter a field being substituted. @ sql/item_cmpfunc.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Now the Item_equal::get_first function accepts a field being substituted and checks whether it belongs to a materialized semi-join. Depending on the check result a field to substitute for or NULL is returned. @ sql/item_cmpfunc.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Now the Item_equal::get_first function accepts as a parameter a field being substituted. @ sql/sql_select.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. The is_sj_materialization_strategy method is added to the JOIN_TAB class to check whether JOIN_TAB belongs to a materialized semi-join. @ sql/sql_select.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. The is_sj_materialization_strategy method is added to the JOIN_TAB class to check whether JOIN_TAB belongs to a materialized semi-join.
[13 Oct 2009 9:39]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/86680 2814 Evgeny Potemkin 2009-10-13 Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. When a subquery is a subject to a semi-join optimization its tables are merged to the upper query and later they treated as usual tables. This allows a bunch of optimizations to be applied, equality propagation is among them. Equality propagation is done after query execution plan is chosen. It substitutes fields from tables being retrieved later for fields from tables being retrieved earlier. However it can't be applied as is to any semi-join table. The semi-join materialization strategy differs from other semi-join strategies that the data from materialized semi-join tables isn't used directly but saved to a temporary table first. The materialization isn't isolated is a separate step, it is done inline within the nested loop execution. When it comes to fetch rows from the first table in the block of materialized semi-join tables they are isolated and the sub_select function is called to materialize result and save it in the semi-join result table. Materialization is done once and later data from the semi-join result table is used. Due to this we can't substitute fields that belong to the semi-join for fields from outer query and vice versa. Example: suppose we have a join order: ot1 ot2 SJ-Mat(it1 it2 it3) ot3 and equality ot2.col = it1.col = it2.col If we're looking for best substitute for 'it2.col', we should pick it1.col and not ot2.col. For a field that is not in a materialized semi-join we must pick a field that's not embedded in a materialized semi-join. Example: suppose we have a join order: SJ-Mat(it1 it2) ot1 ot2 and equality ot2.col = ot1.col = it2.col If we're looking for best substitute for 'ot2.col', we should pick ot1.col and not it2.col, because when we run a join between ot1 and ot2 execution of SJ-Mat(...) has already finished and we can't rely on the value of it*.*. Now the Item_equal::get_first function accepts as a parameter a field being substituted and checks whether it belongs to a materialized semi-join. Depending on the check result a field to substitute for or NULL is returned. The sj_strategy field is added to the st_join_table structure. It's a copy of the POSITION::sj_strategy field and is used to easy checks. @ mysql-test/r/subselect_sj.result A test case added for the bug#45174. @ mysql-test/r/subselect_sj_jcl6.result A test case added for the bug#45174. @ mysql-test/t/subselect_sj.test A test case added for the bug#45174. @ sql/item.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Now the Item_equal::get_first function accepts as a parameter a field being substituted. @ sql/item_cmpfunc.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Now the Item_equal::get_first function accepts a field being substituted and checks whether it belongs to a materialized semi-join. Depending on the check result a field to substitute for or NULL is returned. @ sql/item_cmpfunc.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Now the Item_equal::get_first function accepts as a parameter a field being substituted. @ sql/sql_select.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. The is_sj_materialization_strategy method is added to the JOIN_TAB class to check whether JOIN_TAB belongs to a materialized semi-join. @ sql/sql_select.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. The sj_strategy field is added to the st_join_table structure. It's a copy of the POSITION::sj_strategy field and is used to easy checks.
[24 Oct 2009 14:30]
Guilhem Bichot
sent comments by mail
[11 Dec 2009 13:04]
Guilhem Bichot
Timour's approval at http://lists.mysql.com/commits/85548. Evgeny says he has adressed Timour's requests for clearer comments.
[9 Mar 2010 10:31]
Sergey Petrunya
See also BUG#50019
[29 Sep 2010 14:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/119437 3253 Roy Lyseng 2010-09-29 Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. When a subquery is subject to a semijoin optimization, it's tables are merged to the outer query and later are treated as regular tables. This allows a bunch of optimizations to be applied, equality propagation is among them. Equality propagation is done after query execution plan is chosen. It substitutes fields from tables being retrieved later for fields from tables being retrieved earlier. However, it can't be applied as is to any semijoin table. The semijoin materialization strategy differs from other semijoin strategies that the data from materialized semijoin tables isn't used directly but saved to a temporary table first. The materialization isn't isolated in a separate step, it is done inline within the nested loop execution. When it comes to fetching rows from the first table in the block of materialized semijoin tables, sub_select() function is called to materialize the result of the subquery and save it in the materialized table. Later, data from the materialized table is used as they were regular table rows. Due to this we can't substitute fields that belong to the semi-join for fields from outer query and vice versa. Example: suppose we have a join order: ot1 ot2 SJ-Mat(it1 it2 it3) ot3 and equality ot2.col = it1.col = it2.col If we're looking for best substitute for 'it2.col', we should pick it1.col and not ot2.col. For a field that is not in a materialized semijoin we must pick a field that's not embedded in any materialized semijoin. Example: suppose we have a join order: SJ-Mat(it1 it2) ot1 ot2 and equality ot2.col = ot1.col = it2.col If we're looking for best substitute for 'ot2.col', we should pick ot1.col and not it2.col, because when we run a join between ot1 and ot2, execution of SJ-Mat(...) has already finished and we can't rely on the value of it*.*. Now we have added another Item_equal::get_first function that accepts as a parameter a field being substituted and checks whether it belongs to a materialized semijoin. The field to substitute will be from the same materialized semijoin nest (if supplied field is within such nest), or outside any materialized semijoin nest (if supplied field is outside such nest). The new checks rely on the first_sj_inner_tab and first_sj_inner_tab fields of the join-tab. These fields are therefore set as soon as possible after the join strategy is fixed. mysql-test/include/subquery_sj.inc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Added new tests. mysql-test/r/subquery_mat_all.result mysql-test/r/subquery_sj_all.result mysql-test/r/subquery_sj_all_jcl6.result mysql-test/r/subquery_sj_all_jcl7.result mysql-test/r/subquery_sj_dupsweed.result mysql-test/r/subquery_sj_dupsweed_jcl6.result mysql-test/r/subquery_sj_dupsweed_jcl7.result mysql-test/r/subquery_sj_firstmatch.result mysql-test/r/subquery_sj_firstmatch_jcl6.result mysql-test/r/subquery_sj_firstmatch_jcl7.result mysql-test/r/subquery_sj_loosescan.result mysql-test/r/subquery_sj_loosescan_jcl6.result mysql-test/r/subquery_sj_loosescan_jcl7.result mysql-test/r/subquery_sj_mat.result mysql-test/r/subquery_sj_mat_jcl6.result mysql-test/r/subquery_sj_mat_jcl7.result mysql-test/r/subquery_sj_mat_nosj.result mysql-test/r/subquery_sj_none.result mysql-test/r/subquery_sj_none_jcl6.result mysql-test/r/subquery_sj_none_jcl7.result Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Results for two new tests added. Some tests using semijoin materialization show that where clause has moved from the outer query into the materialized inner query. This is caused by the changed call to get_first() in eliminate_item_equal(). Ex: select * from ot where a in(select b from it where b>0); The clause "b>0" is now evaluated on the inner query materialization. Performance-wise this is never worse when using MaterializeScan and usually better for MaterializeLookup. For the latter strategy, the best possible solution is probably to evaluate the clause in both queries, this can be subject for a later feature development. sql/item.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Calling new get_first() function instead of old. sql/item_cmpfunc.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_first that accepts as parameter a field being substituted. sql/item_cmpfunc.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_first that accepts as parameter a field being substituted. sql/sql_select.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Setting fields first_sj_inner_tab and last_sj_inner_tab moved from setup_semijoin_dups_elimination() to get_best_combination(), so they are set as early as possible after join order optimization.
[7 Oct 2010 15:20]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/120276 3260 Roy Lyseng 2010-10-07 Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan When a subquery is subject to a semijoin optimization, it's tables are merged to the outer query and later are treated as regular tables. This allows a bunch of optimizations to be applied, equality propagation is among them. Equality propagation is done after query execution plan is chosen. It substitutes fields from tables being retrieved later for fields from tables being retrieved earlier. However, it can't be applied as is to any semijoin table. The semijoin materialization strategy differs from other semijoin strategies that the data from materialized semijoin tables isn't used directly but saved to a temporary table first. The materialization isn't isolated in a separate step, it is done inline within the nested loop execution. When it comes to fetching rows from the first table in the block of materialized semijoin tables, sub_select() function is called to materialize the result of the subquery and save it in the materialized table. Later, data from the materialized table is used as they were regular table rows. Due to this we can't substitute fields that belong to the semi-join for fields from outer query and vice versa. Example: suppose we have a join order: ot1 ot2 SJ-Mat(it1 it2 it3) ot3 and equality ot2.col = it1.col = it2.col If we're looking for best substitute for 'it2.col', we should pick it1.col and not ot2.col. For a field that is not in a materialized semijoin we can use any field, even those that are embedded in a materialized semijoin. This is because such fields are "copied back" to their original join-tab structures when the materialized temporary table is being read. Now we have added another Item_equal::get_first() function that accepts as a parameter a field being substituted and checks whether it belongs to a materialized semijoin. The field to substitute will be from the same materialized semijoin nest (if supplied field is within such nest), otherwise it will be the first field in the multiple equality. The new checks rely on the first_sj_inner_tab and first_sj_inner_tab fields of the join-tab. These fields are therefore set as soon as possible after the join strategy is fixed. Also fixed problem appearing in Bug#52068: When MaterializeScan semijoin strategy was used and there were one or more outer dependent tables before the semijoin tables, the scan over the materialized table was not properly reset for each row of the prefix outer tables. Also fixed problems with pushdown of SJM-aware predicates during make_join_select(): wrong predicates were sometimes generated, make_cond_after_sjm() was called at the wrong position, and make_cond_after_sjm() was never actually considering the pushed-down SJM predicates. mysql-test/include/subquery_sj.inc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan Added new tests. mysql-test/r/subquery_mat_all.result mysql-test/r/subquery_sj_all.result mysql-test/r/subquery_sj_all_jcl6.result mysql-test/r/subquery_sj_all_jcl7.result mysql-test/r/subquery_sj_dupsweed.result mysql-test/r/subquery_sj_dupsweed_jcl6.result mysql-test/r/subquery_sj_dupsweed_jcl7.result mysql-test/r/subquery_sj_firstmatch.result mysql-test/r/subquery_sj_firstmatch_jcl6.result mysql-test/r/subquery_sj_firstmatch_jcl7.result mysql-test/r/subquery_sj_loosescan.result mysql-test/r/subquery_sj_loosescan_jcl6.result mysql-test/r/subquery_sj_loosescan_jcl7.result mysql-test/r/subquery_sj_mat.result mysql-test/r/subquery_sj_mat_jcl6.result mysql-test/r/subquery_sj_mat_jcl7.result mysql-test/r/subquery_sj_mat_nosj.result mysql-test/r/subquery_sj_none.result mysql-test/r/subquery_sj_none_jcl6.result mysql-test/r/subquery_sj_none_jcl7.result Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan Results for three new tests added. Some tests using semijoin materialization show that where clause has moved from the outer query into the materialized inner query. This is caused by the changed call to get_first() in eliminate_item_equal(). Ex: select * from ot where a in(select b from it where b>0); The clause "b>0" is now evaluated on the inner query materialization. Performance-wise this is never worse when using MaterializeScan and usually better for MaterializeLookup. For the latter strategy, the best possible solution is probably to evaluate the clause in both queries, this can be subject for a later feature development. Another test that applies the same condition to both the outer and the inner query is added, to show the plan for such types of queries. Earlier wrong semijoin materialization test results are corrected. sql/item.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Calling new get_first() function instead of old. sql/item_cmpfunc.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_first() that accepts as argument a field being substituted. sql/item_cmpfunc.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_first() that accepts as argument a field being substituted. sql/sql_select.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan Setting fields first_sj_inner_tab and last_sj_inner_tab moved from setup_semijoin_dups_elimination() to get_best_combination(), so they are set as early as possible after join order optimization. In make_join_select(), the test that determined when to pushdown SJM-specific predicates was wrong, in addition to approving the comments. The logic of eliminate_item_equal() has been simplified and adjusted so that it generates equalities that are useful also when the semijoin materialization strategy is being used. Some simplification was possible by taking advantage of the new Item_equal::get_first() function. In sub_select_sjm(), moved code that initializes the scan over the materialized table so that it is now performed for each scan of table, instead of only for the first scan. In make_cond_for_table_from_pred(), a number of comments has been added, and TAB characters are replaced by spaces. In make_cond_after_sjm(), make sure that it handles equalities generated for semijoin materialization (with marker=3). Otherwise, removed marker optimizations for this function, as it will only be called once per materialized semijoin nest in a query. Added comments and removed TAB characters.
[13 Oct 2010 11:23]
Øystein Grøvlen
Bug#57421 is another duplicate of this report.
[13 Oct 2010 11:25]
Øystein Grøvlen
Agreed to replace Evgeny as reviewer for this.
[13 Oct 2010 14:13]
SaiKumar V
Including one more testcase for this report when semijoin is on , verified with Oystein Grovlen that the fix for 45174 will fix the failure. Bug found in:- mysql-next-mr-opt-backporting revno: 3259 ---------------------------------------------------------------- drop table C; drop table CC; CREATE TABLE C ( pk int NOT NULL , col_int_key int DEFAULT NULL, col_datetime_nokey datetime DEFAULT NULL, col_varchar_key varchar(1) DEFAULT NULL, col_varchar_nokey varchar(1) DEFAULT NULL, PRIMARY KEY (pk) ) ; INSERT INTO C VALUES (1,2,'2004-10-11 18:13:16','w','w'); INSERT INTO C VALUES (2,9,NULL,'m','m'); INSERT INTO C VALUES (3,3,'1900-01-01 00:00:00','m','m'); INSERT INTO C VALUES (4,9,'2009-07-25 00:00:00','k','k'); INSERT INTO C VALUES (5,NULL,NULL,'r','r'); INSERT INTO C VALUES (6,9,'2008-07-27 00:00:00','t','t'); INSERT INTO C VALUES (7,3,'2002-11-13 16:37:31','j','j'); INSERT INTO C VALUES (8,8,'1900-01-01 00:00:00','u','u'); INSERT INTO C VALUES (9,8,'2003-12-10 00:00:00','h','h'); INSERT INTO C VALUES (10,53,'2001-12-21 22:38:22','o','o'); INSERT INTO C VALUES (11,0,'2008-12-13 23:16:44',NULL,NULL); INSERT INTO C VALUES (12,5,'2005-08-15 12:39:41','k','k'); INSERT INTO C VALUES (13,166,NULL,'e','e'); INSERT INTO C VALUES (14,3,'2006-09-11 12:06:14','n','n'); INSERT INTO C VALUES (15,0,'2007-12-15 12:39:34','t','t'); INSERT INTO C VALUES (16,1,'2005-08-09 00:00:00','c','c'); INSERT INTO C VALUES (17,9,'2001-09-02 22:50:02','m','m'); INSERT INTO C VALUES (18,5,'2005-12-16 22:58:11','y','y'); INSERT INTO C VALUES (19,6,'2007-04-19 00:19:53','f','f'); INSERT INTO C VALUES (20,2,'1900-01-01 00:00:00','d','d'); CREATE TABLE CC ( pk int NOT NULL , col_int_key int DEFAULT NULL, col_datetime_nokey datetime DEFAULT NULL, col_varchar_key varchar(1) DEFAULT NULL, col_varchar_nokey varchar(1) DEFAULT NULL, PRIMARY KEY (pk) ) ; INSERT INTO CC VALUES (10,8,'2002-02-26 06:14:37','v','v'); INSERT INTO CC VALUES (11,9,'1900-01-01 00:00:00','r','r'); INSERT INTO CC VALUES (12,9,'2006-12-03 09:37:26','a','a'); INSERT INTO CC VALUES (13,186,'2008-05-26 12:27:10','m','m'); INSERT INTO CC VALUES (14,NULL,'2004-12-14 16:37:30','y','y'); INSERT INTO CC VALUES (15,2,'2003-02-11 21:19:41','j','j'); INSERT INTO CC VALUES (16,3,'2009-10-18 02:27:49','d','d'); INSERT INTO CC VALUES (17,0,'2000-09-26 07:45:57','z','z'); INSERT INTO CC VALUES (18,133,NULL,'e','e'); INSERT INTO CC VALUES (19,1,'2005-11-10 12:40:29','h','h'); INSERT INTO CC VALUES (20,8,'2009-04-25 00:00:00','b','b'); INSERT INTO CC VALUES (21,5,'2002-11-27 00:00:00','s','s'); INSERT INTO CC VALUES (22,5,'2004-01-26 20:32:32','e','e'); INSERT INTO CC VALUES (23,8,'2007-10-26 11:41:40','j','j'); INSERT INTO CC VALUES (24,6,'2005-10-07 00:00:00','e','e'); INSERT INTO CC VALUES (25,51,'2000-07-15 05:00:34','f','f'); INSERT INTO CC VALUES (26,4,'2000-04-03 16:33:32','v','v'); INSERT INTO CC VALUES (27,7,NULL,'x','x'); INSERT INTO CC VALUES (28,6,'2001-04-25 01:26:12','m','m'); INSERT INTO CC VALUES (29,4,'2000-12-27 00:00:00','c','c'); SELECT pk FROM C WHERE col_varchar_key IN ( SELECT INNR .col_varchar_nokey FROM CC INNR2 LEFT JOIN CC INNR ON INNR2 .col_varchar_nokey > INNR .col_varchar_nokey WHERE INNR .col_varchar_nokey < INNR .col_varchar_key ) ORDER BY col_datetime_nokey ; +----+ | pk | +----+ | 18 | +----+ 1 row in set (0.00 sec)
[22 Oct 2010 7:59]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/121634 3265 Roy Lyseng 2010-10-22 Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan When a subquery is subject to a semijoin optimization, its tables are merged to the outer query and later are treated as regular tables. This allows a bunch of optimizations to be applied, equality propagation is among them. Equality propagation is done after query execution plan is chosen. It substitutes fields from tables being retrieved later for fields from tables being retrieved earlier. However, it can't be applied as is to any semijoin table. The semijoin materialization strategy differs from other semijoin strategies that the data from materialized semijoin tables isn't used directly but saved to a temporary table first. The materialization isn't isolated in a separate step, it is done inline within the nested loop execution. When it comes to fetching rows from the first table in the block of materialized semijoin tables, sub_select() function is called to materialize the result of the subquery and save it in the materialized table. Later, data from the materialized table is used as if they were regular table rows. Due to this we can't substitute fields that belong to the semi-join for fields from outer query and vice versa. Example: suppose we have a join order: ot1 ot2 SJ-Mat(it1 it2 it3) ot3 and equality ot2.col = it1.col = it2.col If we're looking for best substitute for 'it2.col', we should pick it1.col and not ot2.col. For a field that is not in a materialized semijoin we can use any field, even those that are embedded in a materialized semijoin. This is because such fields are "copied back" to their original join-tab structures when the materialized temporary table is being read. Now we have added another Item_equal::get_first() function that accepts as a parameter a field being substituted and checks whether it belongs to a materialized semijoin. The field to substitute will be from the same materialized semijoin nest (if supplied field is within such nest), otherwise it will be the first field in the multiple equality. The new checks rely on the first_sj_inner_tab and first_sj_inner_tab fields of the join-tab. These fields are therefore set as soon as possible after the join strategy is fixed. Also fixed problem appearing in Bug#52068: When MaterializeScan semijoin strategy was used and there were one or more outer dependent tables before the semijoin tables, the scan over the materialized table was not properly reset for each row of the prefix outer tables. Also fixed problems with pushdown of SJM-aware predicates during make_join_select(): wrong predicates were sometimes generated, make_cond_after_sjm() was called at the wrong position, and make_cond_after_sjm() was never actually considering the pushed-down SJM predicates. mysql-test/include/subquery_sj.inc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan Added new tests. mysql-test/r/subquery_mat_all.result mysql-test/r/subquery_sj_all.result mysql-test/r/subquery_sj_all_jcl6.result mysql-test/r/subquery_sj_all_jcl7.result mysql-test/r/subquery_sj_dupsweed.result mysql-test/r/subquery_sj_dupsweed_jcl6.result mysql-test/r/subquery_sj_dupsweed_jcl7.result mysql-test/r/subquery_sj_firstmatch.result mysql-test/r/subquery_sj_firstmatch_jcl6.result mysql-test/r/subquery_sj_firstmatch_jcl7.result mysql-test/r/subquery_sj_loosescan.result mysql-test/r/subquery_sj_loosescan_jcl6.result mysql-test/r/subquery_sj_loosescan_jcl7.result mysql-test/r/subquery_sj_mat.result mysql-test/r/subquery_sj_mat_jcl6.result mysql-test/r/subquery_sj_mat_jcl7.result mysql-test/r/subquery_sj_mat_nosj.result mysql-test/r/subquery_sj_none.result mysql-test/r/subquery_sj_none_jcl6.result mysql-test/r/subquery_sj_none_jcl7.result Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan Results for three new tests added. Some tests using semijoin materialization show that where clause has moved from the outer query into the materialized inner query. This is caused by the changed call to get_first() in eliminate_item_equal(). Ex: select * from ot where a in(select b from it where b>0); The clause "b>0" is now evaluated on the inner query materialization. Performance-wise this is never worse when using MaterializeScan and usually better for MaterializeLookup. For the latter strategy, the best possible solution is probably to evaluate the clause in both queries, this can be subject for a later feature development. Another test that applies the same condition to both the outer and the inner query is added, to show the plan for such types of queries. Earlier wrong semijoin materialization test results are corrected. sql/item.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Calling new get_first() function instead of old. sql/item_cmpfunc.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_first() that accepts as argument a field being substituted. sql/item_cmpfunc.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_first() that accepts as argument a field being substituted. sql/sql_select.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Bug#52068: Optimizer generates invalid semijoin materialization plan Setting fields first_sj_inner_tab and last_sj_inner_tab moved from setup_semijoin_dups_elimination() to get_best_combination(), so they are set as early as possible after join order optimization. In make_join_select(), the test that determined when to pushdown SJM-specific predicates was wrong, in addition to improving the comments. The logic of eliminate_item_equal() has been simplified and adjusted so that it generates equalities that are useful also when the semijoin materialization strategy is being used. Some simplification was possible by taking advantage of the new Item_equal::get_first() function. In sub_select_sjm(), moved code that initializes the scan over the materialized table so that it is now performed for each scan of table, instead of only for the first scan. In make_cond_for_table_from_pred(), a number of comments has been added, and TAB characters are replaced by spaces. In make_cond_after_sjm(), make sure that it handles equalities generated for semijoin materialization (with marker=3). Added comments and removed TAB characters.
[22 Oct 2010 9:33]
Guilhem Bichot
approved with last few comments sent by mail
[25 Oct 2010 9:29]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/121779 3269 Roy Lyseng 2010-10-25 Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. When a subquery is subject to a semijoin transformation, its tables are merged to the outer query and are later treated as regular tables. One possible optimization to apply after semijoin transformation is equality propagation. Equality propagation is done after query execution plan is chosen. It substitutes fields from tables being retrieved later for fields from tables being retrieved earlier. However, equality propagation can't be applied as is when using the semijoin materialization strategy. In this case, data from materialized semijoin tables are saved to a temporary table first before being involved in the join. This has two consequences for equality propagation: 1. Equalities that exist within the materialized tables must be processed within the materialization operation, and 2. Equalities between the materialized table and the outer query must be processed like regular equalities. Example: suppose we have a join order: ot1 ot2 SJ-Mat(it1 it2 it3) ot3 and equality ot2.col = it1.col = it2.col If we're looking for best substitute for it2.col, we should pick it1.col and not ot2.col, because the equality it1.col=it2.col can be processed during the materialization. For a field that is not in a materialized semijoin we can use any field, even those that are embedded in a materialized semijoin. This is because such fields are "copied back" to their original join-tab structures when the materialized temporary table is being read. Now we have added a bew function Item_equal::get_subst_item() that accepts as a parameter a field being substituted and checks whether it belongs to a materialized semijoin. The field to substitute will be from the same materialized semijoin nest (if supplied field is within such nest), otherwise it will be the first field in the multiple equality. The new checks rely on the first_sj_inner_tab and first_sj_inner_tab fields of the join-tab. These fields are therefore set as soon as possible after the join strategy is fixed (before they were only used by the DuplicateWeedout strategy, and were not needed until later). Also fixed problems with pushdown of SJM-aware predicates during make_join_select(): - Wrong predicates were sometimes generated, - make_cond_after_sjm() was called at the wrong position - make_cond_after_sjm() was never actually considering the pushed-down SJM predicates. mysql-test/include/subquery_sj.inc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. mysql-test/r/subquery_mat_all.result mysql-test/r/subquery_sj_all.result mysql-test/r/subquery_sj_all_jcl6.result mysql-test/r/subquery_sj_all_jcl7.result mysql-test/r/subquery_sj_dupsweed.result mysql-test/r/subquery_sj_dupsweed_jcl6.result mysql-test/r/subquery_sj_dupsweed_jcl7.result mysql-test/r/subquery_sj_firstmatch.result mysql-test/r/subquery_sj_firstmatch_jcl6.result mysql-test/r/subquery_sj_firstmatch_jcl7.result mysql-test/r/subquery_sj_loosescan.result mysql-test/r/subquery_sj_loosescan_jcl6.result mysql-test/r/subquery_sj_loosescan_jcl7.result mysql-test/r/subquery_sj_mat.result mysql-test/r/subquery_sj_mat_jcl6.result mysql-test/r/subquery_sj_mat_jcl7.result mysql-test/r/subquery_sj_mat_nosj.result mysql-test/r/subquery_sj_none.result mysql-test/r/subquery_sj_none_jcl6.result mysql-test/r/subquery_sj_none_jcl7.result Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Results for three new tests added. Some tests using semijoin materialization show that where clause has moved from the outer query into the materialized inner query. This is caused by the changed call to get_subst_item() in eliminate_item_equal(). Ex: select * from ot where a in(select b from it where b>0); The clause "b>0" is now evaluated on the inner query materialization. Performance-wise this is never worse than before when using MaterializeScan and usually better than before for MaterializeLookup. (For MaterializeLookup, it is worse when it is more efficient to move the clause to the outer query. The best possible solution for this case is probably to evaluate the clause in both queries, this can be subject for a later feature development.) Another test that applies the same condition to both the outer and the inner query is added, to show the plan for such types of queries. Earlier wrong semijoin materialization test results are corrected. sql/item.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Calling new get_subst_item() function instead of get_first(). sql/item_cmpfunc.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_subst_item() that accepts as argument a field being substituted. sql/item_cmpfunc.h Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. New function Item_equal::get_subst_item() that accepts as argument a field being substituted. sql/sql_select.cc Bug#45174: Incorrectly applied equality propagation caused wrong result on a query with a materialized semi-join. Bug#50019: Wrong result for IN-query with materialization. Setting fields first_sj_inner_tab and last_sj_inner_tab moved from setup_semijoin_dups_elimination() to get_best_combination(), so they are set as early as possible after join order optimization. In make_join_select(), the test that determined when to pushdown SJM-specific predicates was wrong, in addition to improving the comments. The logic of eliminate_item_equal() has been simplified and adjusted so that it generates equalities that are useful also when the semijoin materialization strategy is being used. Some simplification was possible by taking advantage of the new Item_equal::get_subst_item() function. In make_cond_for_table_from_pred(), a number of comments has been added, and TAB characters are replaced by spaces. In make_cond_after_sjm(), make sure that it handles equalities generated for semijoin materialization (with marker=3). Added comments and removed TAB characters.
[13 Nov 2010 16:13]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (merge vers: 5.6.99-m5) (pib:21)
[13 Nov 2010 16:35]
Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:alexander.nozdrin@oracle.com-20101113152540-gxro4g0v29l27f5x) (pib:21)
[22 Nov 2010 1:32]
Paul DuBois
Bug is not in any released 5.6.x version. No changelog entry needed.