Bug #49129 Wrong result with IN-subquery with join_cache_level=6 and firstmatch=off
Submitted: 26 Nov 2009 10:06 Modified: 23 Nov 2010 3:04
Reporter: Øystein Grøvlen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:6.0.14-bzr OS:Any
Assigned to: Jørgen Løland CPU Architecture:Any
Tags: Contribution, join_cache_level, optimizer_switch, semijoin, subquery

[26 Nov 2009 10:06] Øystein Grøvlen
Description:
If FirstMatch is turned off, the following query may give wrong result when join_cache_level=6:

SELECT * FROM t0 WHERE t0.a IN 
  (SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);

(With FirstMatch on, result is currently also wrong, but that is a result of Bug#46556)

The problem goes away if semijoin switch is turned off.

How to repeat:
SET SESSION optimizer_switch = 'firstmatch=off,index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,loosescan=on,materialization=on,semijoin=on';

CREATE TABLE t0 (a INT);
INSERT INTO t0 VALUES (0),(1),(2),(3),(4);

CREATE TABLE t1 (a INT, b INT, KEY(a));
INSERT INTO t1 SELECT a, a from t0;

CREATE TABLE t2 (a INT, b INT, PRIMARY KEY(a));
INSERT INTO t2 SELECT * FROM t1;

UPDATE t1 SET a=3, b=11 WHERE a=4;
UPDATE t2 SET b=11 WHERE a=3;

# This result (0, 1, 2) is wrong, but will be fixed by Bug#46556
SELECT * FROM t0 WHERE t0.a IN 
  (SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);

SET join_cache_level = 6;

# This result (0) is even more wrong ;-)
SELECT * FROM t0 WHERE t0.a IN 
  (SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);

SET SESSION optimizer_switch = 'semijoin=off';

# This result (0, 1, 2, 3) is correct
SELECT * FROM t0 WHERE t0.a IN 
  (SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);

DROP TABLE t0, t1, t2;
[26 Nov 2009 10:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/91762

3725 oystein.grovlen@sun.com	2009-11-26
      Add test case for Bug#49129 to optimizer_unfixed_bugs test suite
     @ mysql-test/suite/optimizer_unfixed_bugs/r/bug49129.result
        Result for test case for Bug#49129
     @ mysql-test/suite/optimizer_unfixed_bugs/t/bug49129.test
        Test case for Bug#49219
[26 Nov 2009 10:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/91763

3736 oystein.grovlen@sun.com	2009-11-26
      Add test case for Bug#49129 to optimizer_unfixed_bugs test suite
     @ mysql-test/suite/optimizer_unfixed_bugs/r/bug49129.result
        Result for test case for Bug#49129
     @ mysql-test/suite/optimizer_unfixed_bugs/t/bug49129.test
        Test case for Bug#49219
[26 Nov 2009 10:15] Øystein Grøvlen
Test bug49129 has been added to the optimizer_unfixed bug test suite.
This test should be removed when the bug has been fixed.
[26 Nov 2009 11:53] Øystein Grøvlen
The patch that was attached to this bug report is not a fix.  It is just a test case that was added to a special test suite, optimizer_unfixed_bugs, that 
contains test cases for bugs that are not yet fixed.
[26 Nov 2009 13:19] Roy Lyseng
Problem may be related to duplicate weedout handling.
At least, when I modified the following code in do_sj_dups_weedout():

if (sjtbl->is_confluent)
[
  if (sjtbl->have_confluent_row)
    DBUG_RETURN(1);
...

and replaced DBUG_RETURN(1) with DBUG_RETURN(0), the query result became correct. No guarantee that this is the root cause, just a hunch...
[26 Nov 2009 15:28] Valeriy Kravchuk
Verified just as described with recent mysql-6.0-codebase from bzr on Mac OS X:

77-52-7-73:6.0-codebase openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 6.0.14-alpha-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> SET SESSION optimizer_switch =
    -> 'firstmatch=off,index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,loosescan=on,materialization=on,semijoin=on';
Query OK, 0 rows affected (0.00 sec)

mysql> 
mysql> CREATE TABLE t0 (a INT);
Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO t0 VALUES (0),(1),(2),(3),(4);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> 
mysql> CREATE TABLE t1 (a INT, b INT, KEY(a));
Query OK, 0 rows affected (0.11 sec)

mysql> INSERT INTO t1 SELECT a, a from t0;
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> 
mysql> CREATE TABLE t2 (a INT, b INT, PRIMARY KEY(a));
Query OK, 0 rows affected (0.07 sec)

mysql> INSERT INTO t2 SELECT * FROM t1;
Query OK, 5 rows affected (0.01 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> 
mysql> UPDATE t1 SET a=3, b=11 WHERE a=4;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> UPDATE t2 SET b=11 WHERE a=3;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> 
mysql> # This result (0, 1, 2) is wrong, but will be fixed by Bug#46556
mysql> SELECT * FROM t0 WHERE t0.a IN 
    ->   (SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);
+------+
| a    |
+------+
|    0 |
|    1 |
|    2 |
+------+
3 rows in set (0.00 sec)

mysql> 
mysql> SET join_cache_level = 6;
Query OK, 0 rows affected (0.00 sec)

mysql> 
mysql> # This result (0) is even more wrong ;-)
mysql> SELECT * FROM t0 WHERE t0.a IN 
    ->   (SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);
+------+
| a    |
+------+
|    0 |
+------+
1 row in set (0.00 sec)

mysql> 
mysql> SET SESSION optimizer_switch = 'semijoin=off';
Query OK, 0 rows affected (0.00 sec)

mysql> 
mysql> # This result (0, 1, 2, 3) is correct
mysql> SELECT * FROM t0 WHERE t0.a IN 
    ->   (SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);
+------+
| a    |
+------+
|    0 |
|    1 |
|    2 |
|    3 |
+------+
4 rows in set (0.01 sec)
[11 Dec 2009 6:00] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091211055901-yp18b3c7xuhl87rf) (version source revid:alik@sun.com-20091211055401-43rjwq7gjed6ds83) (merge vers: 6.0.14-alpha) (pib:13)
[11 Feb 2010 15:35] Guilhem Bichot
Goes away with optimizer_join_cache_level<=4.
BUG#51016 was marked as duplicate of this one.
[23 Feb 2010 20:25] Sergey Petrunya
@[26 Nov 2009 14:19] Roy Lyseng :

This change will essentially turn semi-join processing into inner join
processing. Here's a counterexample: if one runs this:

insert into t1 select * from t1;

SELECT * FROM t0 WHERE t0.a IN (SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);

they'll get 8 rows, which clearly wrong because table t0 has only 5 rows.
[23 Feb 2010 20:32] Sergey Petrunya
EXPLAIN for the "# This result (0) is even more wrong" case:

MySQL [j1a]> SET join_cache_level = 6;
Query OK, 0 rows affected (0.00 sec)

MySQL [j1a]> explain SELECT * FROM t0 WHERE t0.a IN    (SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t0
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
         type: ref
possible_keys: a
          key: a
      key_len: 5
          ref: j1a.t0.a
         rows: 1
        Extra: Start temporary; Using join buffer
*************************** 3. row ***************************
           id: 1
  select_type: PRIMARY
        table: t2
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: j1a.t0.a
         rows: 1
        Extra: Using where; End temporary; Using join buffer
3 rows in set (0.02 sec)
[23 Feb 2010 20:59] Sergey Petrunya
My understanding of the problem:

The picked execution strategy is wrong
--------------------------------------
We shouldn't have sjtbl->is_confluent==TRUE for this example. As comments say:

  /* 
    is_confluent==TRUE means this is a special case where the temptable record
    has zero length (and presence of a unique key means that the temptable can
    have either 0 or 1 records). 
    In this case we don't create the physical temptable but instead record
    its state in SJ_TMP_TABLE::have_confluent_record.
  */
  bool is_confluent;

The problem is that the query from the example is not a case where 
is_confluent==TRUE. According to EXPLAIN, the join order is 

t0    | Using where                                   |
t1    | Start temporary; Using join buffer            |
t2    | Using where; End temporary; Using join buffer |

and "Start/End temporary" show that DuplicateElimination will remove duplicates 
in the range of [t1,t2].  This range indeed has is_confluent==TRUE. 

However, as specified in DuplicateElimination specs in WorkLog (and maybe 
something in the code as well), use of join buffering extends the start of 
duplicate-generating range up to the first non-constant table, in this 
example to table t0.

When table t0 is in duplicate-generating range, then its rowid should be put 
into DuplicateElimination's temporary table, i.e. the temporary table record
won't have zero length, from which it follows that is_confluent==FALSE.

Why did it pick the wrong strategy
----------------------------------
If we look at DuplicateElimination's setup code we'll see that it has a wrong
idea about which tables will/won't use join buffering:

Breakpoint 1, setup_semijoin_dups_elimination (join=0xabf0748, options=4, no_jbuf_after=3) at sql_select.cc:1207
(gdb) p join->best_positions[0].use_join_buffer
  $1 = true
(gdb) p join->best_positions[1].use_join_buffer
  $2 = false
(gdb) p join->best_positions[2].use_join_buffer
  $3 = false

That information comes from the join optimizer. Join optimizer has only
"approximate" data about whether join buffering will be used. The final
decision is made on per-table basis in check_join_cache_usage(), which 
is called from make_join_readinfo()'s main loop.

Proposed solution
-----------------
We can't just move setup_semijoin_dups_elimination() call to be after
make_join_readinfo()'s main loop, because check_join_cache_usage() needs 
to know whether DuplicateElimination is used for the table it is considering.

The solution is to check make_join_readinfo()'s and
setup_semijoin_dups_elimination()'s loops together (this may be not as trivial
as it sounds).
[9 Mar 2010 15:52] Sergey Petrunya
FYI: In MariaDB, we've decided to take a simpler approach:

in setup_semijoin_dups_elimination(), when we see that we're going to use DuplicateElimination, we check if there is any possible chance that join buffering will be used for any of the tables inside DuplicateElimination's duplicate-generating range, and if that is the case, we put all rowids into the temptable.  This makes temptable records slightly bigger, but hopefully that's acceptable.

Our fix is here:
https://lists.launchpad.net/maria-developers/msg02419.html
it can't be readily applied to MySQL codebase but one can get the idea.
[9 Mar 2010 15:53] Sergey Petrunya
Please let me know if you need MySQL's version of fix under SCA.
[4 May 2010 9:01] Roy Lyseng
Never actually worked much on this, so unassigning.
[19 Jun 2010 15:04] Guilhem Bichot
this bug is visible in the diff between 
subquery_sj_none.result and subquery_sj_all_jcl[67].result
(join_cache_level=6 and 7 are affected). BUG#49129 is or will soon be mentioned in the latter files near the problem.
[5 Jul 2010 11:20] Guilhem Bichot
contributed patch at http://lists.mysql.com/internals/37988
[12 Aug 2010 19:12] Guilhem Bichot
See BUG#55955 for another query which hits this bug, and hits even more bugs.
[13 Aug 2010 12:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115670

3228 Jorgen Loland	2010-08-13
      Bug#49129 - Wrong result with IN-subquery with join_cache_level=6 
                  and firstmatch=off
      
      Patch based on contribution from Sergey Petrunia.
      
      Consider the query:
      
      SELECT * FROM t0 WHERE t0.a IN (
         SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);
      
      With join cache level 6, this query only returns the first tuple
      from t0 that has a match in the subquery. Consider the relevant part
      of EXPLAIN:
      
      t0    | Using where                                   |
      t1    | Start temporary; Using join buffer            |
      t2    | Using where; End temporary; Using join buffer |
      
      When the optimizer decides to use join buffers, temporary tables
      created for duplicate weedout should extend to the first table
      after const tables. I.e., Start temporary should be printed for
      t0 above.
      
      The reason for the bug is that setup_semijoin_dups_elimination()
      is called before the final decision is made in
      check_join_cache_usage() on whether or not to use join buffering.
      In this case, use_join_buffer==false for t1 and t2 during
      setup_semijoin_dups_elimination(), and the range of tables to
      buffer is therefore not extended to t0.
      
      Since check_join_cache_usage() needs to know if duplicate weedout
      is used, so moving setup_semijoin_dups_elimination() from before
      check_join_cache_usage() to after it is not possible. 
      
      The temporary fix of this patch is to use a rough estimate of
      whether join buffering will be used in
      setup_semijoin_dups_elimination(). This rough test covers more 
      cases than actually end up with join buffering, and in these cases 
      we now extend the temporary table to store rowids for more tables 
      than strictly required, i.e., the first table up to the start of 
      the semijoin. A proper (but much more costly to do) fix would be to 
      merge the loops of setup_semijoin_dups_elimination() and 
      make_join_readinfo() (which calls check_join_cache_usage()).
     @ mysql-test/include/subquery_sj.inc
        Added test for BUG#49129
     @ mysql-test/r/subquery_sj_all.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_all_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_all_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_dupsweed.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_dupsweed_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_dupsweed_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_firstmatch.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_firstmatch_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_firstmatch_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_loosescan.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_loosescan_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_loosescan_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat_nosj.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_none.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_none_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_none_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ sql/sql_select.cc
        setup_semijoin_dups_elimination() relied on
        best_position[i].use_join_buffer when checking whether temporary
        tables for duplicate weedout should be extended to the first
        table of the execution plan. However, use_join_buffer is not
        reliable at this stage, so setup_semijoin_dups_elimination()
        needs to use a rough estimate instead.
[18 Aug 2010 12:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/116092

3227 Jorgen Loland	2010-08-18
      Bug#49129 - Wrong result with IN-subquery with join_cache_level=6 
                  and firstmatch=off
      
      Patch based on contribution from Sergey Petrunia.
      
      Consider the query:
      
      SELECT * FROM t0 WHERE t0.a IN (
         SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);
      
      With join cache level 6, this query only returns the first tuple
      from t0 that has a match in the subquery. Consider the relevant part
      of EXPLAIN:
      
      t0    | Using where                                   |
      t1    | Start temporary; Using join buffer            |
      t2    | Using where; End temporary; Using join buffer |
      
      When the optimizer decides to use join buffers, temporary tables
      created for duplicate weedout should extend to the first table
      after const tables. I.e., Start temporary should be printed for
      t0 above.
      
      The reason for the bug is that setup_semijoin_dups_elimination()
      is called before the final decision is made in
      check_join_cache_usage() on whether or not to use join buffering.
      In this case, use_join_buffer==false for t1 and t2 during
      setup_semijoin_dups_elimination(), and the range of tables to
      buffer is therefore not extended to t0.
      
      Since check_join_cache_usage() needs to know if duplicate weedout
      is used, so moving setup_semijoin_dups_elimination() from before
      check_join_cache_usage() to after it is not possible. 
      
      The temporary fix of this patch is to use a rough estimate of
      whether join buffering will be used in
      setup_semijoin_dups_elimination(). This rough test covers more 
      cases than actually end up with join buffering, and in these cases 
      we now extend the temporary table to store rowids for more tables 
      than strictly required, i.e., the first table up to the start of 
      the semijoin. A proper (but much more costly to do) fix would be to 
      merge the loops of setup_semijoin_dups_elimination() and 
      make_join_readinfo() (which calls check_join_cache_usage()).
     @ mysql-test/include/subquery_sj.inc
        Added test for BUG#49129
     @ mysql-test/r/subquery_sj_all.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_all_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_all_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_dupsweed.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_dupsweed_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_dupsweed_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_firstmatch.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_firstmatch_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_firstmatch_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_loosescan.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_loosescan_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_loosescan_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat_nosj.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_none.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_none_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_none_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ sql/sql_select.cc
        setup_semijoin_dups_elimination() relied on
        best_position[i].use_join_buffer when checking whether temporary
        tables for duplicate weedout should be extended to the first
        table of the execution plan. However, use_join_buffer is not
        reliable at this stage, so setup_semijoin_dups_elimination()
        needs to use a rough estimate instead.
     @ sql/sql_select.h
        Made st_join_table::use_quick an enum
     @ sql/sql_test.cc
        Made st_join_table::use_quick an enum
[23 Aug 2010 12:58] Jørgen Løland
Roy found that this crashes *without* the patch:
------------------------------------------------
set optimizer_switch='firstmatch=off';
set optimizer_join_cache_level=6;

CREATE TABLE t0 (a INT);
CREATE TABLE t1 (a INT, b INT, KEY(a));
CREATE TABLE t2 (a INT, b INT, PRIMARY KEY(a));
INSERT INTO t0 VALUES (0),(1),(2),(3),(4);
INSERT INTO t1 VALUES (0,0),(1,1),(2,2),(3,3),(3,11);
INSERT INTO t2 VALUES (0,0),(1,1),(2,2),(3,11),(4,4);

SELECT * FROM t0 WHERE t0.a IN 
(SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);
-----------------------------

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff1243710 (LWP 23087)]
0x00000000008456f4 in JOIN_CACHE_BKA::init (this=0x1b0e148)
    at /mysql-next-mr-opt-backporting/sql/sql_join_cache.cc:615
615             if (copy->field->table == tab->table &&
(gdb) p copy->field
$6 = (struct Field *) 0x0

This is the same location in sql_join_cache.cc that was patched above:

-        if (copy->field->table == tab->table &&
+        /*
+          (1) - when we store rowids for DuplicateWeedout, they have
+                copy->field==NULL
+        */
+        if (copy->field &&  // (1)
+            copy->field->table == tab->table &&

Will investigate if this is OK.
[24 Aug 2010 14:02] Jørgen Løland
Explain of the crashing query:
------------------------------

EXPLAIN 
SELECT * FROM t0 WHERE t0.a IN 
(SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	t0	ALL	NULL	NULL	NULL	NULL	5	Using where; Start temporary
1	PRIMARY	t1	ALL	a	NULL	NULL	NULL	5	Using where; Using join buffer (BNL, incremental buffers)
1	PRIMARY	t2	eq_ref	PRIMARY	PRIMARY	4	test.t0.a	1	Using where; End temporary; Using join buffer (BKA, incremental buffers)

Reason for the crash:
---------------------
It happens because setup_semijoin_dups_elimination() correctly extends the range of buffered tables to t0 (see explain above) when handling SJ_OPT_DUPS_WEEDOUT (see Sergey's comment above). The rowid of t0 has to be included in the buffer, and a field is created for this in JOIN_CACHE::create_remaining_fields():

    /* SemiJoinDuplicateElimination: allocate space for rowid if needed */
    if (tab->keep_current_rowid)
    {
      (...)
      copy->field= 0;
      (...)
    }

The "field" used for rowids have copy->field==NULL. So this comment in the patch covers this case:

       /*
          (1) - When we store rowids for DuplicateWeedout, they have
                copy->field==NULL
        */
        if (copy->field &&  // (1)
            copy->field->table == tab->table &&
            bitmap_is_set(key_read_set, copy->field->field_index))
        {

Conclusion: The patch is correct and fixes Roy's query as well as the originally reported one. (Note: they differ in optimizer switches only)
[27 Aug 2010 12:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/117010

3232 Jorgen Loland	2010-08-27
      Bug#49129 - Wrong result with IN-subquery with join_cache_level=6 
                  and firstmatch=off
      
      Patch based on contribution from Sergey Petrunia.
      
      Consider the query:
      
      SELECT * FROM t0 WHERE t0.a IN (
         SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);
      
      With join cache level 6, this query only returns the first tuple
      from t0 that has a match in the subquery. Consider the relevant part
      of EXPLAIN:
      
      t0    | Using where                                   |
      t1    | Start temporary; Using join buffer            |
      t2    | Using where; End temporary; Using join buffer |
      
      When the optimizer decides to use join buffers, temporary tables
      created for duplicate weedout should extend to the first table
      after const tables. I.e., Start temporary should be printed for
      t0 above.
      
      The reason for the bug is that setup_semijoin_dups_elimination()
      is called before the final decision is made in
      check_join_cache_usage() on whether or not to use join buffering.
      In this case, use_join_buffer==false for t1 and t2 during
      setup_semijoin_dups_elimination(), and the range of tables to
      buffer is therefore not extended to t0.
      
      Since check_join_cache_usage() needs to know if duplicate weedout
      is used, so moving setup_semijoin_dups_elimination() from before
      check_join_cache_usage() to after it is not possible. 
      
      The temporary fix of this patch is to use a rough estimate of
      whether join buffering will be used in
      setup_semijoin_dups_elimination(). This rough test covers more 
      cases than actually end up with join buffering, and in these cases 
      we now extend the temporary table to store rowids for more tables 
      than strictly required, i.e., the first table up to the start of 
      the semijoin. A proper (but much more costly to do) fix would be to 
      merge the loops of setup_semijoin_dups_elimination() and 
      make_join_readinfo() (which calls check_join_cache_usage()).
     @ mysql-test/include/subquery_sj.inc
        Added test for BUG#49129
     @ mysql-test/r/subquery_sj_all.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_all_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_all_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_dupsweed.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_dupsweed_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_dupsweed_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_firstmatch.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_firstmatch_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_firstmatch_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_loosescan.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_loosescan_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_loosescan_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat_nosj.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_none.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_none_jcl6.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_none_jcl7.result
        Added test for BUG#49129 and recorded new explain output for queries using duplicate weedout.
     @ sql/sql_select.cc
        setup_semijoin_dups_elimination() relied on
        best_position[i].use_join_buffer when checking whether temporary
        tables for duplicate weedout should be extended to the first
        table of the execution plan. However, use_join_buffer is not
        reliable at this stage, so setup_semijoin_dups_elimination()
        needs to use a rough estimate instead.
     @ sql/sql_select.h
        Made st_join_table::use_quick an enum
     @ sql/sql_test.cc
        Made st_join_table::use_quick an enum
[2 Sep 2010 7:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/117395

3234 Jorgen Loland	2010-09-02
      Bug#49129 - Wrong result with IN-subquery with join_cache_level=6 
                  and firstmatch=off
      
      Patch based on contribution from Sergey Petrunia.
      
      Consider the query:
      
      SELECT * FROM t0 WHERE t0.a IN (
         SELECT t1.a FROM t1, t2 WHERE t2.a=t0.a AND t1.b=t2.b);
      
      With join cache level 6, this query only returns the first tuple
      from t0 that has a match in the subquery. Consider the relevant part
      of EXPLAIN:
      
      t0    | Using where                                   |
      t1    | Start temporary; Using join buffer            |
      t2    | Using where; End temporary; Using join buffer |
      
      When the optimizer decides to use join buffers, temporary tables
      created for duplicate weedout should extend to the first table
      after const tables. I.e., Start temporary should be printed for
      t0 above.
      
      The reason for the bug is that setup_semijoin_dups_elimination()
      is called before the final decision is made in
      check_join_cache_usage() on whether or not to use join buffering.
      In this case, use_join_buffer==false for t1 and t2 during
      setup_semijoin_dups_elimination(), and the range of tables to
      buffer is therefore not extended to t0.
      
      Since check_join_cache_usage() needs to know if duplicate weedout
      is used, so moving setup_semijoin_dups_elimination() from before
      check_join_cache_usage() to after it is not possible. 
      
      The temporary fix of this patch is to use a rough estimate of
      whether join buffering will be used in
      setup_semijoin_dups_elimination(). This rough test covers more 
      cases than actually end up with join buffering, and in these cases 
      we now extend the temporary table to store rowids for more tables 
      than strictly required, i.e., the first non-const table up to the 
      start of the semijoin. A proper (but much more costly to do) fix 
      would be to merge the loops of setup_semijoin_dups_elimination() 
      and make_join_readinfo() (which calls check_join_cache_usage()).
     @ mysql-test/r/subquery_sj_all.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_all_jcl6.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_all_jcl7.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_dupsweed.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_dupsweed_jcl6.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_dupsweed_jcl7.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_firstmatch.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_firstmatch_jcl6.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_firstmatch_jcl7.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_loosescan.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_loosescan_jcl6.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_loosescan_jcl7.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat_jcl6.result
        Recorded new explain output for queries using duplicate weedout.
     @ mysql-test/r/subquery_sj_mat_jcl7.result
        Recorded new explain output for queries using duplicate weedout.
     @ sql/sql_select.cc
        setup_semijoin_dups_elimination() relied on
        best_position[i].use_join_buffer when checking whether temporary
        tables for duplicate weedout should be extended to the first
        table of the execution plan. However, use_join_buffer is not
        reliable at this stage, so setup_semijoin_dups_elimination()
        needs to use a rough estimate instead.
     @ sql/sql_select.h
        Made st_join_table::use_quick an enum
     @ sql/sql_test.cc
        Made st_join_table::use_quick an enum
[2 Sep 2010 9:55] Jørgen Løland
Pushed to mysql-next-mr-opt-backporting
[15 Sep 2010 8:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/118281

3243 Jorgen Loland	2010-09-15
      BUG#49129 - Followup patch to remove compile warning 
      about comparing signed and unsigned int
[21 Sep 2010 12:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/118714

3245 Jorgen Loland	2010-09-21
      BUG#49129 - followup patch. Fix warning about comparing signed 
      and unsigned int.
[2 Oct 2010 18:13] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101002181053-6iotvl26uurcoryp) (version source revid:alexander.nozdrin@oracle.com-20101002180917-h0n62akupm3z20nt) (pib:21)
[13 Nov 2010 16:26] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 3:04] Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.