MySQL Bugs: #53305: Duplicate weedout + join buffer (join cache level=7,8) loses rows

Bug #53305	Duplicate weedout + join buffer (join cache level=7,8) loses rows
Submitted:	30 Apr 2010 9:23	Modified:	23 Nov 2010 3:29
Reporter:	Guilhem Bichot	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	6.0-codebase-bugfixing	OS:	Any
Assigned to:	Guilhem Bichot	CPU Architecture:	Any
Tags:	join_cache_level, optimizer_switch, semijoin

Description:
I have revision-id:guilhem@mysql.com-20100428201121-3997h871x2hu1eee
I build with ndb:
HAVE_CMAKE=no BUILD/compile-pentium64-valgrind-max
(if it doesn't include ndb, take the configure line and add --with-ndbcluster).
ndb_read_multi_range.test fails:

@@ -710,16 +710,6 @@
 where t1.uid in (select t2.uid from t2, t1 where t1.uid=1 and t2.uid=t1.fid)
 and t2.uid=t1.fid;
 name
-A
-A
-B
-B
-C
-D
-E
-F
 G
-H
-I
 set optimizer_join_cache_level=default;
 drop table t1,t2;

It goes away with
  set optimizer_switch="semijoin=off";
It is also seen in pushbuild2.
The failure is introduced by the removal of floor() from best_access_path() in guilhem@mysql.com-20100428194200-tqlpkfre7tzjye0e .

How to repeat:
run this:

-- source include/have_ndb.inc
-- source include/not_embedded.inc

create table t1 (
  uid int, fid int, index(uid)
) engine=ndb;
insert into t1 values
  (1,1), (1,2), (1,3), (1,4),
  (2,5), (2,6), (2,7), (2,8),
  (3,1), (3,2), (3,9);

create table t2 (
  uid int primary key, name varchar(128), index(name)
) engine=ndb;
insert into t2 values 
  (1, "A"), (2, "B"), (3, "C"), (4, "D"), (5, "E"),
  (6, "F"), (7, "G"), (8, "H"), (9, "I");

create table t3 (
  uid int, fid int, index(uid)
) engine=ndb;
insert into t3 values
  (1,1), (1,2), (1,3),(1,4),
  (2,5), (2,6), (2,7), (2,8),
  (3,1), (3,2), (3,9);

create table t4 (
  uid int primary key, name varchar(128), index(name)
) engine=ndb;
insert into t4 values 
  (1, "A"), (2, "B"), (3, "C"), (4, "D"), (5, "E"),
  (6, "F"), (7, "G"), (8, "H"), (9, "I");

select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid;

set optimizer_join_cache_level=7;

explain select name from t2, t1 
  where t1.uid in (select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid)
        and t2.uid=t1.fid;

--sorted_result
select name from t2, t1 
  where t1.uid in (select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid)
        and t2.uid=t1.fid;

--sorted_result
select name from t2, t1 
  where t1.uid in (1,3,2,3,4)
        and t2.uid=t1.fid;

set optimizer_join_cache_level=default;

drop table t1,t2,t3,t4;

The wrong results happen only for NDB.

Suggested fix:
The "how-to-repeat" results even before the removal of floor() look suspicious:
EXPLAIN shows no semijoin strategy, and adding one row to a table shows a bug (BUG#53298); it may be that the chosen plan is inconsistent.

Removing floor() only changes certain costs. Debugging shows that indeed costs change slightly, but the chosen order of tables is the same with and without floor(). With floor(), no semijoin strategy is shown in EXPLAIN. Without floor(), Duplicate Weedout is shown. It's surprising that a same plan can be executed with no semijoin strategy choice or with Duplicate Weedout.

It is suggested to first fix BUG#53298 before debugging the current one. It may be that the current one is just a consequence of something inconsistent with the query plan (which would be the same cause of the two bugs). BUG#53298 should be attacked first, because it occurs with MyISAM too, so will be more frequent, and is easier to reproduce (test takes less time).

The proposed patch for BUG#31480 doesn't help.

The removal of floor() only changed some statistics, which led the optimizer to pick a different query plan.
Before this removal, the test was in the conditions of BUG#53298 (which was an incorrect query plan, == optimization stage bug). By chance this incorrect query plan resulted in a correct result (but as BUG#53298 shows, adding a row to a table made it reveal its bad nature).
After the removal of floor(), the optimizer picks a correct query plan, which alas makes the test enter the conditions of an already existing, but never hit so far, execution stage bug ("duplicate weedout + join buffer" bug), which is the subject of this BUG#53305.
The proposed fix for BUG#53298 does not make BUG#53305 go away.

After the proposed fix for BUG#53298 is pushed, the testcase of BUG#53305 will show the bug for MyISAM tables too (will not require NDB tables anymore). Again, it's not that BUG#53298 is introducing a bug, it's only giving the MyISAM-tables test a correct query plan which hits the execution stage bug.
That bug is that:
- duplicate weedout is selected
- do_sj_dups_weedout() gets the rows' rowid via h->ref, but this h->ref is full of zeroes for table t1 and t4; this suggests that it hasn't been filled by handler's position() calls.
- as those rowids are 0s for all rows, all rows but the first one are considered duplicate, only one row is returned, bug.

It might be that the missing positions are available from some join buffers somewhere, that should be researched.
Bug goes away with semijoin=off or optimizer_join_cache_level<7.

When writing a fix, please verify that ndb_read_multi_range passes, as well as the MyISAM-tables testcase (take the testcase posted above and replace =ndb with =myisam), they may need a different fix as query plans are different (BKA, not BKA, etc).

Note that BUG#49129 mentions duplicate weedout and rowids somewhere, but I have no idea whether this is related.

Thank you for the report.

Verified as described.

With NDB:

select name from t2, t1 
where t1.uid in (select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid)
and t2.uid=t1.fid;
name
B

With MyISAM:

select name from t2, t1 
where t1.uid in (select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid)
and t2.uid=t1.fid;
name
A
A
B
B
C
D
E
F
G
H
I

In next-mr-opt-backporting ndb_read_multi_range passes; this is because NDB in this tree doesn't support MRR, so the EXPLAIN has no "using join buffer". No join buffer, no bug.
But we still see the bug in MyISAM.

testcase with a bit less rows:

create table t1 (
  uid int, fid int, index(uid)
);
INSERT into t1 values (1,1);
INSERT into t1 values (1,2);
INSERT into t1 values (1,3);
INSERT into t1 values (1,4);
INSERT into t1 values (2,5);
INSERT into t1 values (2,6);
INSERT into t1 values (2,7);
INSERT into t1 values (2,8);
INSERT into t1 values (3,1);
INSERT into t1 values (3,2);
INSERT into t1 values (3,9);

create table t2 (
  uid int, name varchar(128)
);
INSERT into t2 values (8, "H");
INSERT into t2 values (9, "I");

create table t3 (
  uid int, fid int
);
INSERT into t3 values (1,3);
INSERT into t3 values (3,9);

create table t4 (
  uid int primary key, name varchar(128)
);
INSERT into t4 values (3, "C");
INSERT into t4 values (10, "J");

select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid;

# no subq
select name from t2, t1 
  where t1.uid in (3)
        and t2.uid=t1.fid;

# subq
select name from t2, t1 
  where t1.uid in (select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid)
        and t2.uid=t1.fid;

set optimizer_join_cache_level=7;

explain select name from t2, t1 
  where t1.uid in (select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid)
        and t2.uid=t1.fid;

# bug
select name from t2, t1 
  where t1.uid in (select t4.uid from t4, t3 where t3.uid=1 and t4.uid=t3.fid)
        and t2.uid=t1.fid;

set optimizer_join_cache_level=default;

drop table t1,t2,t3,t4;

The buggy final SELECT returns no rows, the previous ones correctly return one row ("I").
Bug happens only with join_cache_level=7-8 (==JOIN_CACHE_BKA_UNIQUE).

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/110030

3183 Guilhem Bichot	2010-06-02
      Fix for BUG#53305 "Duplicate weedout + join buffer (join cache --level=7,8) loses rows"
      When joining records, JOIN_CACHE_BKA_UNIQUE forgot to set the rowid, so some rows
      were wrongly said to have the same rowid and were wrongly deleted by Duplicate Weedout.
     @ mysql-test/r/subselect_sj2.result
        result for bug's testcase, at join_cache_level=default
     @ mysql-test/r/subselect_sj2_jcl6.result
        result for bug's testcase, at join_cache_level=6
     @ mysql-test/r/subselect_sj2_jcl7.result
        result; all correct (except BUG 49129 which subselect_sj2_jcl6 also has).
        Without the code fix, the very final SELECT showed 3 rows "A","A","E"
        (missed rows), and
        "select * from t3 where b in (select A.a+B.a from t0 A, t0 B where B.a<5);"
        missed rows too.
     @ mysql-test/r/subselect_sj_jcl7.result
        result, all correct
     @ mysql-test/t/subselect_sj2.test
        test for BUG#53305
     @ mysql-test/t/subselect_sj2_jcl7.test
        run subselect_sj2 with optimizer_join_cache_level=7, like
        the existing subselect_sj2_jcl6. =7 is used instead of =8,
        to test regular buffers, because _jcl6 already tests incremental
        buffers.
     @ mysql-test/t/subselect_sj_jcl7.test
        run subselect_sj with optimizer_join_cache_level=7, like
        the existing subselect_sj_jcl6.
     @ sql/sql_join_cache.cc
        JOIN_CACHE_BKA_UNIQUE::join_matching_records(), unlike
        JOIN_CACHE_BNL::join_matching_records() and
        JOIN_CACHE_BKA::join_matching_records(), doesn't think of
        putting the record's position in the proper place (handler::ref)
        for do_sj_dups_weedout() to read.
        As a result, in the testcase (subselect_sj2_jcl6), where query plan is
        t3,t4,t1,t2, when we want to access t1:
        - we use JOIN_CACHE_BKA_UNIQUE (as join_cache_level is 7 or 8)
        - JOIN_CACHE_BKA_UNIQUE::join_matching_records() first goes through
        buffered records of t4 and for each such record, collects key (to be used
        for looking up into t1's index) and rowid: this is done by
        DsMrr_impl::dsmrr_fill_buffer().
        - after that, JOIN_CACHE_BKA_UNIQUE::join_matching_records() scans
        this collection of keys; for each key, does a lookup in t1, finds a
        matching t1 record
        - then calls JOIN_CACHE::generate_full_extensions for this record,
        which calls do_sj_dups_weedout(). That function expects to find
        in t1's handler::ref the rowid of the matching t1 record. But that
        rowid is not there, because it was not saved there after the key
        lookup.
        - So do_sj_dups_weedout() reads the same rowid (an out-of-date value)
        from handler::ref for each record of t1, so several records of t1 are
        wrongly eliminated as duplicate (Duplicate Weedout eliminates
        rows with identical rowids, as expected).
        The fix: save rowid in t1's handler::ref after key lookup, like in
        JOIN_CACHE_BKA::join_matching_records().

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/110440

3183 Guilhem Bichot	2010-06-08
      Fix for BUG#53305 "Duplicate weedout + join buffer (join cache --level=7,8) loses rows"
      When joining records, JOIN_CACHE_BKA_UNIQUE forgot to set the rowid, so some rows
      were wrongly said to have the same rowid and were wrongly deleted by Duplicate Weedout.
     @ mysql-test/r/subselect_sj2_jcl7.result
        result; all correct (except BUG 49129 which subselect_sj2_jcl6 also has).
        Without the code fix,
        "select * from t3 where b in (select A.a+B.a from t0 A, t0 B where B.a<5);"
        missed rows.
     @ mysql-test/r/subselect_sj_jcl6.result
        result, all correct
     @ mysql-test/r/subselect_sj_jcl7.result
        result, all correct. Without the code fix, the very final SELECT showed 3 rows "A","A","E"
        (missed rows).
     @ mysql-test/t/subselect_sj.test
        test for BUG#53305
     @ mysql-test/t/subselect_sj2_jcl7.test
        run subselect_sj2 with optimizer_join_cache_level=7, like
        the existing subselect_sj2_jcl6. =7 is used instead of =8,
        to test regular buffers, because _jcl6 already tests incremental
        buffers.
     @ mysql-test/t/subselect_sj_jcl7.test
        run subselect_sj with optimizer_join_cache_level=7, like
        the existing subselect_sj_jcl6.
     @ sql/sql_join_cache.cc
        JOIN_CACHE_BKA_UNIQUE::join_matching_records(), unlike
        JOIN_CACHE_BNL::join_matching_records() and
        JOIN_CACHE_BKA::join_matching_records(), doesn't think of
        putting the record's position in the proper place (handler::ref)
        for do_sj_dups_weedout() to read.
        As a result, in the testcase (subselect_sj2_jcl6), where query plan is
        t3,t4,t1,t2, when we want to access t1:
        - we use JOIN_CACHE_BKA_UNIQUE (as join_cache_level is 7 or 8)
        - JOIN_CACHE_BKA_UNIQUE::join_matching_records() first goes through
        buffered records of t4 and for each such record, collects key (to be used
        for looking up into t1's index) and rowid: this is done by
        DsMrr_impl::dsmrr_fill_buffer().
        - after that, JOIN_CACHE_BKA_UNIQUE::join_matching_records() scans
        this collection of keys; for each key, does a lookup in t1, finds a
        matching t1 record
        - then calls JOIN_CACHE::generate_full_extensions for this record,
        which calls do_sj_dups_weedout(). That function expects to find
        in t1's handler::ref the rowid of the matching t1 record. But that
        rowid is not there, because it was not saved there after the key
        lookup.
        - So do_sj_dups_weedout() reads the same rowid (an out-of-date value)
        from handler::ref for each record of t1, so several records of t1 are
        wrongly eliminated as duplicate (Duplicate Weedout eliminates
        rows with identical rowids, as expected).
        The fix: save rowid in t1's handler::ref after key lookup, like in
        JOIN_CACHE_BKA::join_matching_records().

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/110629

3189 Guilhem Bichot	2010-06-08
      Fix for BUG#53305 "Duplicate weedout + join buffer (join cache --level=7,8) loses rows"
      When joining records, JOIN_CACHE_BKA_UNIQUE forgot to set the rowid, so some rows
      were wrongly said to have the same rowid and were wrongly deleted by Duplicate Weedout.
     @ mysql-test/r/subselect_sj2_jcl7.result
        result; all correct (except BUG 49129 which subselect_sj2_jcl6 also has).
        Without the code fix,
        "select * from t3 where b in (select A.a+B.a from t0 A, t0 B where B.a<5);"
        missed rows.
     @ mysql-test/r/subselect_sj_jcl6.result
        result, all correct
     @ mysql-test/r/subselect_sj_jcl7.result
        result, all correct. Without the code fix, the very final SELECT showed 3 rows "A","A","E"
        (missed rows).
     @ mysql-test/t/subselect_sj.test
        test for BUG#53305
     @ mysql-test/t/subselect_sj2_jcl7.test
        run subselect_sj2 with optimizer_join_cache_level=7, like
        the existing subselect_sj2_jcl6. =7 is used instead of =8,
        to test regular buffers, because _jcl6 already tests incremental
        buffers.
     @ mysql-test/t/subselect_sj_jcl7.test
        run subselect_sj with optimizer_join_cache_level=7, like
        the existing subselect_sj_jcl6.
     @ sql/sql_join_cache.cc
        JOIN_CACHE_BKA_UNIQUE::join_matching_records(), unlike
        JOIN_CACHE_BNL::join_matching_records() and
        JOIN_CACHE_BKA::join_matching_records(), doesn't think of
        putting the record's position in the proper place (handler::ref)
        for do_sj_dups_weedout() to read.
        As a result, in the testcase (subselect_sj2_jcl6), where query plan is
        t3,t4,t1,t2, when we want to access t1:
        - we use JOIN_CACHE_BKA_UNIQUE (as join_cache_level is 7 or 8)
        - JOIN_CACHE_BKA_UNIQUE::join_matching_records() first goes through
        buffered records of t4 and for each such record, collects key (to be used
        for looking up into t1's index) and rowid: this is done by
        DsMrr_impl::dsmrr_fill_buffer().
        - after that, JOIN_CACHE_BKA_UNIQUE::join_matching_records() scans
        this collection of keys; for each key, does a lookup in t1, finds a
        matching t1 record
        - then calls JOIN_CACHE::generate_full_extensions for this record,
        which calls do_sj_dups_weedout(). That function expects to find
        in t1's handler::ref the rowid of the matching t1 record. But that
        rowid is not there, because it was not saved there after the key
        lookup.
        - So do_sj_dups_weedout() reads the same rowid (an out-of-date value)
        from handler::ref for each record of t1, so several records of t1 are
        wrongly eliminated as duplicate (Duplicate Weedout eliminates
        rows with identical rowids, as expected).
        The fix: save rowid in t1's handler::ref after key lookup, like in
        JOIN_CACHE_BKA::join_matching_records().

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/110636

3189 Guilhem Bichot	2010-06-09
      Fix for BUG#53305 "Duplicate weedout + join buffer (join cache --level=7,8) loses rows"
      When joining records, JOIN_CACHE_BKA_UNIQUE forgot to set the rowid, so some rows
      were wrongly said to have the same rowid and were wrongly deleted by Duplicate Weedout.
     @ mysql-test/r/subselect_sj2_jcl7.result
        result; all correct (except BUG 49129 which subselect_sj2_jcl6 also has).
        Without the code fix,
        "select * from t3 where b in (select A.a+B.a from t0 A, t0 B where B.a<5);"
        missed rows.
     @ mysql-test/r/subselect_sj_jcl6.result
        result, all correct
     @ mysql-test/r/subselect_sj_jcl7.result
        result, all correct. Without the code fix, the very final SELECT showed 3 rows "A","A","E"
        (missed rows).
     @ mysql-test/t/subselect_sj.test
        test for BUG#53305
     @ mysql-test/t/subselect_sj2_jcl7.test
        run subselect_sj2 with optimizer_join_cache_level=7, like
        the existing subselect_sj2_jcl6. =7 is used instead of =8,
        to test regular buffers, because _jcl6 already tests incremental
        buffers.
     @ mysql-test/t/subselect_sj_jcl7.test
        run subselect_sj with optimizer_join_cache_level=7, like
        the existing subselect_sj_jcl6.
     @ sql/sql_join_cache.cc
        JOIN_CACHE_BKA_UNIQUE::join_matching_records(), unlike
        JOIN_CACHE_BNL::join_matching_records() and
        JOIN_CACHE_BKA::join_matching_records(), doesn't think of
        putting the record's position in the proper place (handler::ref)
        for do_sj_dups_weedout() to read.
        As a result, in the testcase (subselect_sj2_jcl6), where query plan is
        t3,t4,t1,t2, when we want to access t1:
        - we use JOIN_CACHE_BKA_UNIQUE (as join_cache_level is 7 or 8)
        - JOIN_CACHE_BKA_UNIQUE::join_matching_records() first goes through
        buffered records of t4 and for each such record, collects key (to be used
        for looking up into t1's index) and rowid: this is done by
        DsMrr_impl::dsmrr_fill_buffer().
        - after that, JOIN_CACHE_BKA_UNIQUE::join_matching_records() scans
        this collection of keys; for each key, does a lookup in t1, finds a
        matching t1 record
        - then calls JOIN_CACHE::generate_full_extensions for this record,
        which calls do_sj_dups_weedout(). That function expects to find
        in t1's handler::ref the rowid of the matching t1 record. But that
        rowid is not there, because it was not saved there after the key
        lookup.
        - So do_sj_dups_weedout() reads the same rowid (an out-of-date value)
        from handler::ref for each record of t1, so several records of t1 are
        wrongly eliminated as duplicate (Duplicate Weedout eliminates
        rows with identical rowids, as expected).
        The fix: save rowid in t1's handler::ref after key lookup, like in
        JOIN_CACHE_BKA::join_matching_records().

queued to next-mr-opt-backporting

Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)

Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)

Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.