Bug #46556 | Returning incorrect, empty results for some IN subqueries w/ semijoin=on | ||
---|---|---|---|
Submitted: | 4 Aug 2009 20:59 | Modified: | 23 Nov 2010 2:55 |
Reporter: | Patrick Crews | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 6.0 | OS: | Any |
Assigned to: | Øystein Grøvlen | CPU Architecture: | Any |
Tags: | firstmatch, optimizer_switch, semijoin, subquery |
[4 Aug 2009 20:59]
Patrick Crews
[12 Nov 2009 12:44]
Tor Didriksen
The query below, originally from http://bugs.mysql.com/bug.php?id=46550 Azalea returning duplicate results for some IN subqueries w/ semijoin=on may be a similar problem? # Begin test case for query 1 --disable_warnings DROP TABLE IF EXISTS CC, B; --enable_warnings CREATE TABLE `CC` ( `int_nokey` int(11) DEFAULT NULL, `int_key` int(11) DEFAULT NULL, `varchar_key` varchar(1) DEFAULT NULL, `varchar_nokey` varchar(1) DEFAULT NULL, KEY `int_key` (`int_key`), KEY `varchar_key` (`varchar_key`,`int_key`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO `CC` VALUES (3,1,'m','m'),(196,40,'h','h'),(4,1,'r','r'),(0,1,'h','h'),(4,9,'x','x'),(3,NULL,'q','q'), (3,NULL,'k','k'),(NULL,7,'l','l'),(35,182,'k','k'),(246,202,'a','a'),(1,7,'x','x'),(9,6,'j ','j'),(8,119,'z','z'),(5,4,'d','d'),(2,5,'h','h'),(4,1,'u','u'),(8,3,'q','q'),(3,7,'a','a '),(7,3,'e','e'),(2,6,'l','l'); CREATE TABLE `B` ( `int_nokey` int(11) DEFAULT NULL, `int_key` int(11) DEFAULT NULL, `varchar_key` varchar(1) DEFAULT NULL, `varchar_nokey` varchar(1) DEFAULT NULL, KEY `int_key` (`int_key`), KEY `varchar_key` (`varchar_key`,`int_key`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO `B` VALUES (NULL,7,NULL,NULL),(9,4,'x','x'); SELECT DISTINCT MIN( `varchar_key`) AS field1 , `int_nokey` AS field2 , `int_key` AS field3 FROM CC AS table1 WHERE (table1 . `varchar_nokey` IN ( SELECT subquery_t1 . `varchar_key` FROM B AS subquery_t1 CROSS JOIN B AS subquery_t2 ON ( subquery_t2 . `int_key` >= subquery_t1 . `int_key` ) ) ) GROUP BY field2, field3; # Diff: # --- /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen69966-1249399469-server0.dump 2 009-08-04 11:24:29.000000000 -0400 # +++ /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen69966-1249399469-server1.dump 2 009-08-04 11:24:29.000000000 -0400 # @@ -1,2 +0,0 @@ # -x 1 7 # -x 4 9 DROP TABLE CC, B; # End of test case for query 1
[24 Nov 2009 8:18]
Øystein Grøvlen
Query plan for query 0 above: EXPLAIN EXTENDED SELECT vkey FROM C WHERE pk IN (SELECT t1.pk FROM C t1 JOIN C t2 ON t2.vkey = t1.vnokey); id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY C ALL PRIMARY NULL NULL NULL 20 100.00 1 PRIMARY t2 index vkey vkey 4 NULL 20 100.00 Using index; FirstMatch(C) 1 PRIMARY t1 eq_ref PRIMARY PRIMARY 4 test.C.pk 1 100.00 Using where Warnings: Note 1003 select `test`.`C`.`vkey` AS `vkey` from `test`.`C` `t1` semi join (`test`.`C` `t2`) join `test`.`C` where ((`test`.`t1`.`vnokey` = `test`.`t2`.`vkey`) and (`test`.`t1`.`pk` = `test`.`C`.`pk`))
[24 Nov 2009 10:22]
Øystein Grøvlen
If first match is off, the result is correct. Otherwise, result seem to be very dependent on the data. If I for test case 0 above, remove rows with VARCHAR fields that are NULL, I get two 'd' rows as the result. This is the first two rows when sorted alphabetically. This is probably because the varchar_key index is used to scan t2. If I remove the 'd' rows too, I get the two 'e' rows. If table contains 3 or less rows another plan is selected and the result will be correct. For 4 rows, only rows with the "smallest" letter are returned. The query plan for for rows are not quite the same as above. Join buffer is used on t1 instead of primary-key lookup. For 5 rows and above, the exact same query plan is used.
[24 Nov 2009 11:25]
Øystein Grøvlen
It seems the scan on t2 stops when the first row does match anything in t1 for the current primary key of C. This happens because the return_tab of t2 refers to C. I suspect that it is t1 which should refer to C, and that it is the reordering of t1 and t2 that creates trouble.
[24 Nov 2009 14:56]
Øystein Grøvlen
Table t1 will be pulled out in this case because of the primary-key correlation. In that case, I would assume that t1 becomes an outer correlated table to t2. If so, the chosen join sequence should not be a candidate for FirstMatch since it requires that all outer correlated tables comes before the inner tables.
[25 Nov 2009 8:51]
Øystein Grøvlen
advance_sj_state consider a table to be an outer correlated table if it is contained in sj_nest->nested_join->sj_corr_tables or sj_nest->nested_join->sj_depends_on. Neither of these sets contains t1 in this case. Maybe these needs to be updated when a table is pulled out of sj_nest?
[26 Nov 2009 11:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/91773 3734 oystein.grovlen@sun.com 2009-11-26 Bug#46556 Returning incorrect, empty results for some IN subqueries w/ semijoin=on Problem: When a table is pulled out of a semijoin, it is not considered whether this makes it a correlated outer table. If it really is a correlated outer table, FirstMatch strategy requires that it comes before the semijoin-nest in the join sequence. Since this information is not recorded, this restriction is not enforced. Solution: If a table is pulled out of semijoin nest due to eq_ref, check whether it is used in the on_expr for the semijoin nest. If so, add it to the set of tables that are regarded as correlated to the semijoin nest. @ mysql-test/r/subselect_sj.result Result for test case for Bug#46566 @ mysql-test/r/subselect_sj2.result Changed query plan due to the fix for Bug#46566. Changed some data in that test case to show that the previous query plan was wrong. @ mysql-test/r/subselect_sj2_jcl6.result Changed query plan due to the fix for Bug#46566. Changed some data to that test case to show that the previous query plan was wrong. Due to Bug#49129, the query will now return wrong result when setting join_cache_level=6 @ mysql-test/r/subselect_sj_jcl6.result Result for test case for Bug#46566. @ mysql-test/t/subselect_sj.test Added test case for Bug#46556. @ mysql-test/t/subselect_sj2.test The fix for Bug#46566 changes the query plan for a query. Changed some data in that test case to show that the previous query plan wass wrong. Due to Bug#49129, the query will now return wrong result when setting join_cache_level=6 @ sql/sql_select.cc If a table is pulled out of semijoin nest due to eq_ref, include it in sj_nest->nested_join->sj_corr_tables if it is used in sj_nest->sj_on_expr. This makes sure it is considered an outer correlated table to this semijoin nest.
[1 Dec 2009 11:46]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/92258 3744 oystein.grovlen@sun.com 2009-12-01 Bug#46556 Returning incorrect, empty results for some IN subqueries w/ semijoin=on Problem: When a table is pulled out of a semijoin, it is not considered whether this makes it a correlated outer table. If it really is a correlated outer table, FirstMatch strategy requires that it comes before the semijoin-nest in the join sequence. Since this information is not recorded, this restriction is not enforced. Solution: If a table is pulled out of semijoin nest due to eq_ref, add it to the set of tables that are regarded as correlated to the semijoin nest. @ mysql-test/r/subselect_sj.result Result for test case for Bug#46566 @ mysql-test/r/subselect_sj2.result Changed query plan due to the fix for Bug#46566. Changed some data in that test case to show that the previous query plan was wrong. @ mysql-test/r/subselect_sj2_jcl6.result Changed query plan due to the fix for Bug#46566. Changed some data to that test case to show that the previous query plan was wrong. Due to Bug#49129, the query will now return wrong result when setting join_cache_level=6 @ mysql-test/r/subselect_sj_jcl6.result Result for test case for Bug#46566. @ mysql-test/t/subselect_sj.test Added test case for Bug#46556. @ mysql-test/t/subselect_sj2.test The fix for Bug#46566 changes the query plan for a query. Changed some data in that test case to show that the previous query plan wass wrong. Due to Bug#49129, the query will now return wrong result when setting join_cache_level=6 @ sql/sql_select.cc If a table is pulled out of semijoin nest due to eq_ref, include it in sj_nest->nested_join->sj_corr_tables and sj_nest->nested_join->sj_depends_on. (It has been verified that this equivalent to what will happen if the table is pulled out manually by rewriting the query.) This makes sure the table is considered an outer correlated table to this semijoin nest.
[11 Dec 2009 14:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/93698 3768 oystein.grovlen@sun.com 2009-12-11 Bug#46556 Returning incorrect, empty results for some IN subqueries w/ semijoin=on Problem: When a table is pulled out of a semijoin, it is not considered whether this makes it a correlated outer table. If it really is a correlated outer table, FirstMatch strategy requires that it comes before the semijoin-nest in the join sequence. Since this information is not recorded, this restriction is not enforced. Solution: If a table is pulled out of semijoin nest due to eq_ref, add it to the set of tables that are regarded as correlated to the semijoin nest. @ mysql-test/r/subselect_sj.result Result for test case for Bug#46566 @ mysql-test/r/subselect_sj2.result Changed query plan due to the fix for Bug#46556. Changed some data in that test case to show that the previous query plan was wrong. @ mysql-test/r/subselect_sj2_jcl6.result Changed query plan due to the fix for Bug#46556. Changed some data to that test case to show that the previous query plan was wrong. Due to Bug#49129, the query will now return wrong result when setting join_cache_level=6 @ mysql-test/r/subselect_sj_jcl6.result Result for test case for Bug#46556. @ mysql-test/t/subselect_sj.test Added test case for Bug#46556. @ mysql-test/t/subselect_sj2.test The fix for Bug#46556 changes the query plan for a query. Changed some data in that test case to show that the previous query plan wass wrong. Due to Bug#49129, the query will now return wrong result when setting join_cache_level=6 @ sql/sql_select.cc If a table is pulled out of semijoin nest due to eq_ref, include it in sj_nest->nested_join->sj_corr_tables and sj_nest->nested_join->sj_depends_on. (It has been verified that this equivalent to what will happen if the table is pulled out manually by rewriting the query.) This makes sure the table is considered an outer correlated table to this semijoin nest.
[11 Dec 2009 14:26]
Øystein Grøvlen
Patch pushed as revid:oystein.grovlen@sun.com-20091211140923-q0uu1djdd5de4ce0
[21 Dec 2009 15:39]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091221153807-80nxoli1tw1z9bxn) (version source revid:oystein.grovlen@sun.com-20091211140923-q0uu1djdd5de4ce0) (merge vers: 6.0.14-alpha) (pib:15)
[8 Jan 2010 21:11]
Paul DuBois
Noted in 6.0.14 changelog. With semijoin optimization enabled, some queries using IN subqueries incorrectly returned empty results.
[7 Apr 2010 9:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/105134 3066 oystein.grovlen@sun.com 2010-04-07 Bug#46556 Returning incorrect, empty results for some IN subqueries w/ semijoin=on (Backporting of revid:oystein.grovlen@sun.com-20091211140923-q0uu1djdd5de4ce0) Problem: When a table is pulled out of a semijoin, it is not considered whether this makes it a correlated outer table. If it really is a correlated outer table, FirstMatch strategy requires that it comes before the semijoin-nest in the join sequence. Since this information is not recorded, this restriction is not enforced. Solution: If a table is pulled out of semijoin nest due to eq_ref, add it to the set of tables that are regarded as correlated to the semijoin nest. @ mysql-test/r/subselect_sj.result Result for test case for Bug#46566 @ mysql-test/r/subselect_sj2.result Changed query plan due to the fix for Bug#46556. Changed some data in that test case to show that the previous query plan was wrong. @ mysql-test/t/subselect_sj.test Added test case for Bug#46556. @ mysql-test/t/subselect_sj2.test The fix for Bug#46556 changes the query plan for a query. Changed some data in that test case to show that the previous query plan wass wrong. @ sql/sql_select.cc If a table is pulled out of semijoin nest due to eq_ref, include it in sj_nest->nested_join->sj_corr_tables and sj_nest->nested_join->sj_depends_on. (It has been verified that this equivalent to what will happen if the table is pulled out manually by rewriting the query.) This makes sure the table is considered an outer correlated table to this semijoin nest.
[16 Aug 2010 6:34]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:26]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 2:55]
Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.