Bug #51457 Firstmatch semijoin strategy gives wrong results for certain query plans
Submitted: 24 Feb 2010 14:24 Modified: 23 Nov 2010 3:20
Reporter: Guilhem Bichot Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:6.0-codebase-bugfixing OS:Linux
Assigned to: Roy Lyseng CPU Architecture:Any
Tags: firstmatch, optimizer_switch, semijoin, subquery

[24 Feb 2010 14:24] Guilhem Bichot
Description:
Bug verifier, this is a bit technical. You may want to trust me that the bug exists and set it to "verified" without reproducing it.
I have marc.alff@sun.com-20100224072543-kqm9qvcsuv5jgocy .
One of the strategies to handle semijoin is named "Firstmatch", and according to comments in sql_select.cc:setup_semijoin_dups_elimination(), it applies to such range of tables:

    FirstMatch strategy
    ~~~~~~~~~~~~~~~~~~~

      (ot|nt)*  [ it ((it|nt)* it) ]  (nt)*
      +------+  +==================+  +---+
        (1)             (2)          (3)

      (1) - Prefix of outer and non-correlated tables
      (2) - The handled range, which may contain only inner and
            non-correlated tables.
      (3) - The suffix of outer non-correlated tables.

Example:
SELECT * from ta_ot,tc_nt where ta_ot.a=tc_nt.a and ta_ot.a in (select tb_it1.a from tb_it1, td_it2);
- ta_ot is the outer table "ot" (it's not in the subselect, and its column is used as the left operand of IN(subselect)).
- tb_it1 and td_it2 are the inner tables "it" (they are in the subselect)
- tc_nt is the non-correlated table "nt" (not in the subselect and not in the left operand of IN).
When the query execution plan chosen by the Optimizer is, in order: (ta_ot, tb_it1, tc_nt, td_it2), we fit the applicability criterion above, with (1) being ta_ot, (2) being (tb_it1,tc_nt,td_it2), and (3) being empty.
The bug is that when such plan is chosen, firstmatch gives wrong results.
I was not able to make the optimizer choose such plan, but if I force it to choose such plan, I see the bad results. It will happen one day that such plan is chosen by the optimizer, due to indexes, other bugfixes, certain data distribution in the customer's database, so this bug needs to be fixed now.
Note that the fix for BUG#50361 does not help.

How to repeat:
1) patch the server (attached file); this will expose debug-specific keywords which can be used to force a certain query plan and semijoin strategy to be used; this will also give you the testcase named "itntit.test".

2) compile the server, run test "itntit", observe SELECT results, they show that
- jumping back to "ot" from "it2" is wrong (misses a row)
- jumping back to "nt" from "it2" is wrong too if we change the testcase a bit (adds rows)

Suggested fix:
It seems that the only correct way is that we jump to "nt" from "it2" *and* jump to "ot" from "it1".
Make sure that a firstmatch range has only inner tables and no non-correlated tables, so in our case ot,it1,nt,it2: a range (it1) jumping back to the table at its left (ot), and another range (it2) jumping back to the table at its left (nt).
We can either produce short ranges in the beginning of the process, or at end (keep a single range with "nt" as it is now, and then split it on "nt" boundaries).
[24 Feb 2010 14:25] Guilhem Bichot
patch with debug-specific keywords, and a test file

Attachment: patch.txt (text/plain), 6.67 KiB.

[24 Feb 2010 17:07] Guilhem Bichot
Here is a reasoning why it2 jumping to ot or nt, cannot work.
When we are executing the join with plan ot,it1,nt,it2:
- we read a record from ot
- we read a record from it1 until we find a match for ta_ot.a=tb_it1.a
- we read a record from nt until we find a match for ta_ot.a=tc_nt.a
- we read a record from it2 until we find a match for ta_ot.a=tb_it2.a
- if it2 has "firstmatch and jump to ot", we don't read more it2 records, and jump back to the next record of ot. So the record of ot has been joined with at most one record from nt; that is wrong, because there is a join, not semijoin, between ot and nt. It may miss records this way.
- if instead it2 has "firstmatch and jump to it", we don't read more it2 records, and jump back to the next record of nt, then to it2, back to nt, etc, until we have exhausted nt, so far this is correct, then we get back to it1, which doesn't have any "firstmatch and jump back to ot" mark, so we read the next record of it1, so if we find a match in it1 again, we go to nt (from the start of nt again), then to it2 (from the start of it2 again), assuming we find matches, we will output N records if it1 has N records matching ot.a, which is wrong for a semijoin.
- on the other hand, having it2 jump back to nt *and* it1 jump back to ot, would not have any of those problems.
[3 Mar 2010 6:06] Sveta Smirnova
Thank you for the report.

Verified as described.
[30 Jul 2010 11:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/114722

3222 Roy Lyseng	2010-07-30
      Bug#51457: Firstmatch semijoin strategy gives wrong results for
                 certain query plans
      
      When the optimizer selects a semijoin FirstMatch plan that has
      non-correlated tables interspersed with inner tables, the "jumps"
      are placed so that execution might become incorrect.
      Example: A query with an outer correlated table ot, a non-correlated
      table nt and two inner tables it1 and it2 are used in a query and the
      optimizer chooses the FirstMatch join order ot - it1 - nt - it2.
      The optimizer assigns a "jump" from table it2 to ot. This strategy will
      omit duplicates resulting from table nt.
      
      The fix assures that the jump from the last table in a consecutive
      range of inner table goes to the most immediately preceeding outer
      table. In this case, there are two consecutive ranges of inner tables:
      {it1} and {it2}. In the first range, the last inner table is it1, the
      jump should be to table ot. In the second range, the jump from it2
      should be to nt.
      
      It is currently possible to generate such query plans, hence the
      bug fix includes some test cases. However, such plans are probably
      generated due to a problem with the join optimizer cost model.
      When this problem is fixed, it may be impossible to generate
      such FirstMatch plans, unless one can assign different costs
      to different tables, or another measure is used to force a specific
      join order.
      
      mysql-test/r/subselect_innodb.result
        Test results for bug#51457.
      
      mysql-test/t/subselect_innodb.test
        Test case for bug#51457. Notice the above comment for test validity.
        The test goes into subselect_innodb because it is required to use
        tables with one row that are not hit by const table optimization.
      
      sql/sql_select.cc
        In setup_semijoin_dups_elimination(), in the case for FIRST_MATCH,
        loop over the set of inner and non-correlated outer tables and assign
        jump target to the last table in each consecutive range of inner
        tables. The jump target should be the most immediately preceeding
        outer table.
[3 Aug 2010 7:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/114897

3222 Roy Lyseng	2010-08-03
      Bug#51457: Firstmatch semijoin strategy gives wrong results for
                 certain query plans
      
      When the optimizer selects a semijoin FirstMatch plan that has
      non-correlated tables interspersed with inner tables, the "jumps"
      are placed so that execution might become incorrect.
      Example: A query with an outer correlated table ot, a non-correlated
      table nt and two inner tables it1 and it2 are used in a query and the
      optimizer chooses the FirstMatch join order ot - it1 - nt - it2.
      The optimizer assigns a "jump" from table it2 to ot. This strategy will
      omit duplicates resulting from table nt.
      
      The fix assures that the jump from the last table in a consecutive
      range of inner table goes to the most immediately preceeding outer
      table (regardless of whether the outer table is correlated or not).
      In this case, there are two consecutive ranges of inner tables:
      {it1} and {it2}. In the first range, the last inner table is it1, the
      jump should be to table ot. In the second range, the jump from it2
      should be to nt.
      
      It is currently possible to generate such query plans, hence the
      bug fix includes some test cases. However, such plans are probably
      generated due to a problem with the join optimizer cost model.
      When this problem is fixed, it may be impossible to generate
      such FirstMatch plans, unless one can assign different costs
      to different tables, or another measure is used to force a specific
      join order.
      
      mysql-test/r/subselect_innodb.result
        Test results for bug#51457.
      
      mysql-test/t/subselect_innodb.test
        Test case for bug#51457. Notice the above comment for test validity.
        The test goes into subselect_innodb because it is required to use
        tables with one row that are not hit by const table optimization.
      
      sql/sql_select.cc
        In setup_semijoin_dups_elimination(), in the case for FIRST_MATCH,
        loop over the set of inner and non-correlated outer tables and assign
        jump target to the last table in each consecutive range of inner
        tables. The jump target should be the most immediately preceeding
        outer table.
[4 Aug 2010 7:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/114976

3222 Roy Lyseng	2010-08-04
      Bug#51457: Firstmatch semijoin strategy gives wrong results for
                 certain query plans
      
      When the optimizer selects a semijoin FirstMatch plan that has
      non-correlated tables interspersed with inner tables, the "jumps"
      are placed so that execution might become incorrect.
      Example: A query with an outer correlated table ot, a non-correlated
      table nt and two inner tables it1 and it2 are used in a query and the
      optimizer chooses the FirstMatch join order ot - it1 - nt - it2.
      The optimizer assigns a "jump" from table it2 to ot. This strategy will
      omit duplicates resulting from table nt.
      
      The fix assures that the jump from the last table in a consecutive
      range of inner table goes to the most immediately preceeding outer
      table (regardless of whether the outer table is correlated or not).
      In this case, there are two consecutive ranges of inner tables:
      {it1} and {it2}. In the first range, the last inner table is it1, the
      jump should be to table ot. In the second range, the jump from it2
      should be to nt.
      
      It is currently possible to generate such query plans, hence the
      bug fix includes some test cases. However, such plans are probably
      generated due to a problem with the join optimizer cost model.
      When this problem is fixed, it may be impossible to generate
      such FirstMatch plans, unless one can assign different costs
      to different tables, or another measure is used to force a specific
      join order.
      
      mysql-test/r/subselect_innodb.result
        Test results for bug#51457.
      
      mysql-test/t/subselect_innodb.test
        Test case for bug#51457. Notice the above comment for test validity.
        The test goes into subselect_innodb because it is required to use
        tables with one row that are not hit by const table optimization.
      
      sql/sql_select.cc
        In setup_semijoin_dups_elimination(), in the case for FIRST_MATCH,
        loop over the set of inner and non-correlated outer tables and assign
        jump target to the last table in each consecutive range of inner
        tables. The jump target should be the most immediately preceeding
        outer table.
[2 Oct 2010 18:15] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101002181053-6iotvl26uurcoryp) (version source revid:alexander.nozdrin@oracle.com-20101002180917-h0n62akupm3z20nt) (pib:21)
[13 Nov 2010 16:23] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 3:20] Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.