Bug #51110 Wrong result with COUNT(*) and semijoin
Submitted: 11 Feb 2010 15:52 Modified: 23 Nov 2010 3:20
Reporter: Guilhem Bichot Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:6.0-codebase-bugfixing OS:Any
Assigned to: Roy Lyseng CPU Architecture:Any
Tags: optimizer_switch, semijoin

[11 Feb 2010 15:52] Guilhem Bichot
Description:
I have alik@sun.com-20100209121620-s1wbu87xtdqihlha

In the test in how-to-repeat, SELECT COUNT(*) etc returns 4999, though the expected count is 5000 (which is returned properly when using optimizer_switch=semijoin=off).

How to repeat:
This is taken from subselect3.test:

create table t0 (a int);
insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

#
# Test if we handle duplicate elimination temptable overflowing to disk
#
create table t1 (a int) as select * from t0 where a < 5;

set @@optimizer_switch='firstmatch=off,materialization=off';

select count(*) from t0 A, t0 B, t0 C, t0 D where D.a in (select a from t1 E);

drop table t0, t1;
[11 Feb 2010 16:00] Valeriy Kravchuk
This is a regression comparing to mysql-6.0-codebase.
[11 Feb 2010 16:13] Guilhem Bichot
It is a regression compared to next-mr which gives the correct 5000.
Valeriy, I'm removing the regression tag because we use it only for bugs which we cannot make go away by disabling a 6.0-specific feature like semijoin. In the present case, disabling semijoin makes the bug go away. Consider this a "planning aid" for us, though you are right, it's a regression.
Here are our team-internal guidelines for tagging, but you don't need to learn them: https://inside.mysql.com/wiki/ENG:Optimizer#List_of_bugs_to_fix
[11 Feb 2010 17:38] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior using revision 3880. Which revision and OS do you use? How do you compile sources?
[11 Feb 2010 18:04] Guilhem Bichot
I have alik@sun.com-20100209121620-s1wbu87xtdqihlha .
Sveta, 3880 isn't a meaningful number: such numbers can change any time when someone pushes. They are "not stable", stable IDs are seen with
"bzr log --show-ids", they have the form user@host-something-cryptic.
The bug is that when the test is run, it returns 4999. If I add --mysqld=--optimizer_switch=semijoin=off, I get 5000.
If you still cannot reproduce it, I'll update to the latest 6.0-codebase-bugfixing and retest.
[11 Feb 2010 18:40] Sveta Smirnova
Guilhem,

thank you for the feedback. I am using newer one alik@sun.com-20100211142301-6cg3oerdwu6s0jn0 and can not repeat described behavior. Please retest in your environment.
[11 Feb 2010 20:35] Guilhem Bichot
I'm sorry for losing your time Sveta. Indeed my testcase is missing one important setting. Here is the new testcase:

create table t0 (a int);
insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table t1 (a int) as select * from t0 where a < 5;
set @@optimizer_switch='firstmatch=off,materialization=off';
set @@max_heap_table_size= 16384;
select count(*) from t0 A, t0 B, t0 C, t0 D where D.a in (select a from t1 E);
drop table t0,t1;

I run this, and I get 4999. I run this with --mysqld=--optimizer_switch=semijoin=off, and then I get the correct 5000.
I have the latest 6.0-codebase-bugfixing: revision-id:magne.mahre@sun.com-20100211173921-lwmmw6oto3jrkfxg
[11 Feb 2010 21:52] Sveta Smirnova
Thank you for the update.

Verified as described.
[22 Mar 2010 12:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/103970

3845 Roy Lyseng	2010-03-22
      Bug#51110: Wrong result with COUNT(*) and semijoin
      
      The duplicate weedout semijoin strategy is shipping certain column
      combinations to a temporary table, in order to filter out any duplicates
      caused by reading duplicate rows from semijoined tables.
      
      The temporary table starts out as a memory table, but if the size reaches
      a maximum value, it is converted to a disk-based table.
      When such conversion occurs, the number of rows output from the query is
      reported as one less than it should be.
      
      The reason for this is that when do_sj_dups_weedout() creates the
      disk-based table, the return value is erroneously reported as 1
      instead of 0, meaning that the row in question is always registered
      as a duplicate. The problem is fixed by setting proper return values.
      
      mysql-test/r/subselect3.result
        Fixed result for bug#51110.
      
      mysql-test/r/subselect3_jcl6.result
        Fixed result for bug#51110.
      
      mysql-test/t/subselect3.test
        Extended previously failing test to be performed with and without
        temporary table overflow.
      
      sql/sql_select.cc
        do_sj_dups_weedout() returns 1 if create_internal_tmp_table_from_heap()
        is called successfully.
[23 Mar 2010 9:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/104061

3848 Roy Lyseng	2010-03-23
      Bug#51110: Wrong result with COUNT(*) and semijoin
      
      The duplicate weedout semijoin strategy is shipping certain column
      combinations to a temporary table, in order to filter out any duplicates
      caused by reading duplicate rows from semijoined tables.
      
      The temporary table starts out as a memory table, but if the size reaches
      a maximum value, it is converted to a disk-based table.
      When such conversion occurs, the number of rows output from the query is
      reported as one less than it should be.
      
      The reason for this is that when do_sj_dups_weedout() creates the
      disk-based table, the return value is erroneously reported as 1
      instead of 0, meaning that the row in question is always registered
      as a duplicate. The problem is fixed by setting proper return values.
      
      The function create_internal_tmp_table_from_heap(), which converts the
      memory table to a disk-based table, also copies the offending row
      (the "last row") into the new table. There is an argument ignore_last_dup
      that manages whether duplicate insertions of this row was allowed, but
      duplicate weedout also needed to know whether a duplicate was reported
      or not. This problem was fixed by adding an optional output argument
      is_duplicate to the function.
      
      mysql-test/r/subselect3.result
        Fixed result for bug#51110.
      
      mysql-test/r/subselect3_jcl6.result
        Fixed result for bug#51110.
      
      mysql-test/t/subselect3.test
        Extended previously failing test to be performed with and without
        temporary table overflow. Also added test so that overflow happens
        with and without duplicate row.
      
      sql/sql_select.cc
        do_sj_dups_weedout() returns 1 if create_internal_tmp_table_from_heap()
        is called successfully.
        Added output argument "is_duplicate" to
        create_internal_tmp_table_from_heap, added Doxygen header.
      
      sql/sql_select.h
        Updated prototype of create_internal_tmp_table_from_heap()
      
      sql/sql_show.cc
        Added NULL argument to create_internal_tmp_table_from_heap()
      
      sql/sql_union.cc
        Added NULL argument to create_internal_tmp_table_from_heap()
      
      sql/sql_update.cc
        Added NULL argument to create_internal_tmp_table_from_heap()
[24 Mar 2010 11:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/104189

3853 Roy Lyseng	2010-03-24
      Bug#51110: Wrong result with COUNT(*) and semijoin
      
      The duplicate weedout semijoin strategy is shipping certain column
      combinations to a temporary table, in order to filter out any duplicates
      caused by reading duplicate rows from semijoined tables.
      
      The temporary table starts out as a memory table, but if the size reaches
      a maximum value, it is converted to a disk-based table.
      When such conversion occurs, the number of rows output from the query is
      reported as one less than it should be.
      
      The reason for this is that when do_sj_dups_weedout() creates the
      disk-based table, the return value is erroneously reported as 1
      instead of 0, meaning that the row in question is always registered
      as a duplicate. The problem is fixed by setting proper return values.
      
      The function create_internal_tmp_table_from_heap(), which converts the
      memory table to a disk-based table, also copies the offending row
      (the "last row") into the new table. There is an argument ignore_last_dup
      that manages whether duplicate insertions of this row was allowed, but
      duplicate weedout also needed to know whether a duplicate was reported
      or not. This problem was fixed by adding an optional output argument
      is_duplicate to the function.
      
      mysql-test/r/subselect3.result
        Fixed result for bug#51110.
      
      mysql-test/r/subselect3_jcl6.result
        Fixed result for bug#51110.
      
      mysql-test/t/subselect3.test
        Extended previously failing test to be performed with and without
        temporary table overflow. Also added test so that overflow happens
        with and without duplicate row.
      
      sql/sql_select.cc
        do_sj_dups_weedout() returns 1 if create_internal_tmp_table_from_heap()
        is called successfully.
        Added output argument "is_duplicate" to
        create_internal_tmp_table_from_heap, added Doxygen header.
      
      sql/sql_select.h
        Updated prototype of create_internal_tmp_table_from_heap()
      
      sql/sql_show.cc
        Added NULL argument to create_internal_tmp_table_from_heap()
      
      sql/sql_union.cc
        Added NULL argument to create_internal_tmp_table_from_heap()
      
      sql/sql_update.cc
        Added NULL argument to create_internal_tmp_table_from_heap()
[16 Apr 2010 14:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105889

3874 Roy Lyseng	2010-04-16
      Bug#51110: Wrong result with COUNT(*) and semijoin
      
      The duplicate weedout semijoin strategy is shipping certain column
      combinations to a temporary table, in order to filter out any duplicates
      caused by reading duplicate rows from semijoined tables.
      
      The temporary table starts out as a memory table, but if the size reaches
      a maximum value, it is converted to a disk-based table.
      When such conversion occurs, the number of rows output from the query is
      reported as one less than it should be.
      
      The reason for this is that when do_sj_dups_weedout() creates the
      disk-based table, the return value is erroneously reported as 1
      instead of 0, meaning that the row in question is always registered
      as a duplicate. The problem is fixed by setting proper return values.
      
      The function create_internal_tmp_table_from_heap(), which converts the
      memory table to a disk-based table, also copies the offending row
      (the "last row") into the new table. There is an argument ignore_last_dup
      that manages whether duplicate insertions of this row was allowed, but
      duplicate weedout also needed to know whether a duplicate was reported
      or not. This problem was fixed by adding an optional output argument
      is_duplicate to the function.
      
      mysql-test/r/subselect3.result
        Fixed result for bug#51110.
      
      mysql-test/r/subselect3_jcl6.result
        Fixed result for bug#51110.
      
      mysql-test/t/subselect3.test
        Extended previously failing test to be performed with and without
        temporary table overflow. Also added test so that overflow happens
        with and without duplicate row.
      
      sql/sql_select.cc
        do_sj_dups_weedout() returns 1 if create_internal_tmp_table_from_heap()
        is called successfully.
        Added output argument "is_duplicate" to
        create_internal_tmp_table_from_heap, added Doxygen header.
      
      sql/sql_select.h
        Updated prototype of create_internal_tmp_table_from_heap()
      
      sql/sql_show.cc
        Added NULL argument to create_internal_tmp_table_from_heap()
      
      sql/sql_union.cc
        Added NULL argument to create_internal_tmp_table_from_heap()
      
      sql/sql_update.cc
        Added NULL argument to create_internal_tmp_table_from_heap()
[27 Apr 2010 9:45] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100427094135-5s49ecp3ckson6e2) (version source revid:alik@sun.com-20100427093843-uekr85qkd7orx12t) (merge vers: 6.0.14-alpha) (pib:16)
[12 May 2010 1:12] Paul DuBois
Noted in 6.0.14 changelog.

With the duplicate weedout semijoin strategy, if a memory temporary
table was converted to a disk table, the row count reported for the
query result was one too low.
[14 May 2010 11:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108314

3177 Roy Lyseng	2010-05-14
      Bug#51110: Wrong result with COUNT(*) and semijoin
                
      The duplicate weedout semijoin strategy is shipping certain column
      combinations to a temporary table, in order to filter out any duplicates
      caused by reading duplicate rows from semijoined tables.
                
      The temporary table starts out as a memory table, but if the size reaches
      a maximum value, it is converted to a disk-based table.
      When such conversion occurs, the number of rows output from the query is
      reported as one less than it should be.
                
      The reason for this is that when do_sj_dups_weedout() creates the
      disk-based table, the return value is erroneously reported as 1
      instead of 0, meaning that the row in question is always registered
      as a duplicate. The problem is fixed by setting proper return values.
                
      The function create_internal_tmp_table_from_heap(), which converts the
      memory table to a disk-based table, also copies the offending row
      (the "last row") into the new table. There is an argument ignore_last_dup
      that manages whether duplicate insertions of this row was allowed, but
      duplicate weedout also needed to know whether a duplicate was reported
      or not. This problem was fixed by adding an optional output argument
      is_duplicate to the function.
                
      mysql-test/r/subselect3.result
        Fixed result for bug#51110.
                
      mysql-test/r/subselect3_jcl6.result
        Fixed result for bug#51110.
                
      mysql-test/t/subselect3.test
        Extended previously failing test to be performed with and without
        temporary table overflow. Also added test so that overflow happens
        with and without duplicate row.
                
      sql/sql_select.cc
        do_sj_dups_weedout() returns 1 if create_internal_tmp_table_from_heap()
        is called successfully.
        Added output argument "is_duplicate" to
        create_internal_tmp_table_from_heap, added Doxygen header.
                
      sql/sql_select.h
        Updated prototype of create_internal_tmp_table_from_heap()
                
      sql/sql_show.cc
        Added NULL argument to create_internal_tmp_table_from_heap()
                
      sql/sql_union.cc
        Added NULL argument to create_internal_tmp_table_from_heap()
                
      sql/sql_update.cc
        Added NULL argument to create_internal_tmp_table_from_heap()
      
      revid:vvaintroub@mysql.com-20100416140221-iyjzh23wj1eew5qw..roy.lyseng@sun.com-20100416142009-8lw6ptxkgbguohy2
[16 Aug 2010 6:35] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:06] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 3:20] Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.