Bug #52582 Incorrect results with COUNT DISTINCT and BIG_TABLES
Submitted: 4 Apr 2010 3:51 Modified: 19 Jun 2013 17:49
Reporter: Dan Kloke (Candidate Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:4.1.25, 5.0.91, 5.1.54, 5.5.10 OS:Any (IBM System i 32 bit, Linux)
Assigned to: CPU Architecture:Any
Tags: big_tables, count, COUNT DISTINCT, distinct, qc, sql_big_tables

[4 Apr 2010 3:51] Dan Kloke
Description:
Count distinct on a simple join gives incorrect results when big_tables is enabled.

How to repeat:
create table test.test1 (a_id integer unsigned auto_increment not null,primary key(a_id));
insert into test.test1 (a_id) values (null),(null),(null),(null),(null);

create table test.test2 (b_id integer unsigned auto_increment not null,primary key(b_id),a_id integer unsigned,key(a_id));
insert into test.test2 (b_id,a_id) values (null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5);

#no problem, first column returns 1, second column returns 2
SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test2 b JOIN test.test1 a ON b.a_ID = a.a_ID WHERE a.a_ID < 2;

#explicit value in WHERE gives incorrect result for count(DISTINCT a.a_ID) (7) only when big_tables is enabled. 
#try this statement with big_tables enabled and disabled.
SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test1 a JOIN test.test2 b ON b.a_ID = a.a_ID WHERE a.a_ID = 1;

show variables like "%big_tables";
"Variable_name","Value"
"big_tables","OFF"
"sql_big_tables","OFF"

SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test1 a JOIN test.test2 b ON b.a_ID = a.a_ID WHERE a.a_ID = 1;
"count(DISTINCT a.a_ID)","count(b.b_ID)"
1,7

show variables like "%big_tables";
"Variable_name","Value"
"big_tables","ON"
"sql_big_tables","ON"

SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test1 a JOIN test.test2 b ON b.a_ID = a.a_ID WHERE a.a_ID = 1;
"count(DISTINCT a.a_ID)","count(b.b_ID)"
7,7

Suggested fix:
I'm guessing this has something to do with the way distinct values are cached under big_tables.
[4 Apr 2010 8:06] Valeriy Kravchuk
Looks like I miss something. Why do you think that value 7 for the second column of your query is wrong:

77-52-28-202:5.1 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 108
Server version: 5.1.46-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create table test.test1 (a_id integer unsigned auto_increment not null,primary
    -> key(a_id));
Query OK, 0 rows affected (0.07 sec)

mysql> insert into test.test1 (a_id) values (null),(null),(null),(null),(null);
Query OK, 5 rows affected (0.01 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> create table test.test2 (b_id integer unsigned auto_increment not null,primary
    -> key(b_id),a_id integer unsigned,key(a_id));
Query OK, 0 rows affected (0.06 sec)

mysql> insert into test.test2 (b_id,a_id) values
    -> (null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5);
Query OK, 35 rows affected (0.00 sec)
Records: 35  Duplicates: 0  Warnings: 0

mysql> SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test2 b JOIN test.test1 a ON
    -> b.a_ID = a.a_ID WHERE a.a_ID < 2;
+------------------------+---------------+
| count(DISTINCT a.a_ID) | count(b.b_ID) |
+------------------------+---------------+
|                      1 |             7 |
+------------------------+---------------+
1 row in set (0.05 sec)

mysql> show variables like '%big_tables';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| big_tables     | OFF   |
| sql_big_tables | OFF   |
+----------------+-------+
2 rows in set (0.01 sec)

mysql> SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test1 a JOIN test.test2 b ON
    -> b.a_ID = a.a_ID WHERE a.a_ID = 1;
+------------------------+---------------+
| count(DISTINCT a.a_ID) | count(b.b_ID) |
+------------------------+---------------+
|                      1 |             7 |
+------------------------+---------------+
1 row in set (0.00 sec)

mysql> SELECT a.a_ID, b.b_ID FROM test.test1 a JOIN test.test2 b ON b.a_ID = a.a_ID WHERE a.a_ID = 1;
+------+------+
| a_ID | b_ID |
+------+------+
|    1 |    1 |
|    1 |    6 |
|    1 |   11 |
|    1 |   16 |
|    1 |   21 |
|    1 |   26 |
|    1 |   31 |
+------+------+
7 rows in set (0.00 sec)

I see 7 rows above (that satisfy WHERE clause after joining), so why 7 is wrong?
[4 Apr 2010 16:05] Dan Kloke
Hi Valeriy, thanks for getting back to me.

The problem is in the first column, not the second.

You have to alter your my.ini startup file to include big_tables in the [mysqld] section. Place the text "big_tables" on a line by itself to enable the option.

Then shut down and restart your server, then try the query. I get a 7 in the FIRST column when big_tables is ON. Run "show variables like '%big_tables'" to make sure the big_tables is enabled (ON), there is no error if it is OFF.

Thanks, D
[4 Apr 2010 16:47] Valeriy Kravchuk
Sorry for misunderstanding. Verified just as describedf with recetn 5.1.46 from bzr on Linux:

openxs@suse:/home2/openxs/dbs/5.1> bin/mysqld_safe --big-tables &
[1] 6564
openxs@suse:/home2/openxs/dbs/5.1> 100710 23:09:35 mysqld_safe Logging to '/home2/openxs/dbs/5.1/var/suse.err'.
100710 23:09:36 mysqld_safe Starting mysqld daemon with databases from /home2/openxs/dbs/5.1/var

openxs@suse:/home2/openxs/dbs/5.1> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.46-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like '%big%';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| big_tables      | ON    |
| sql_big_selects | ON    |
| sql_big_tables  | ON    |
+-----------------+-------+
3 rows in set (0.00 sec)

mysql> create table test.test1 (a_id integer unsigned auto_increment not null,primary
    -> key(a_id));
Query OK, 0 rows affected (0.06 sec)

mysql> insert into test.test1 (a_id) values (null),(null),(null),(null),(null);
Query OK, 5 rows affected (0.01 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> create table test.test2 (b_id integer unsigned auto_increment not null,primary
    -> key(b_id),a_id integer unsigned,key(a_id));
Query OK, 0 rows affected (0.03 sec)

mysql> insert into test.test2 (b_id,a_id) values
    -> (null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5);
Query OK, 35 rows affected (0.01 sec)
Records: 35  Duplicates: 0  Warnings: 0

mysql> SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test2 b JOIN test.test1 a ON
    -> b.a_ID = a.a_ID WHERE a.a_ID < 2;
+------------------------+---------------+
| count(DISTINCT a.a_ID) | count(b.b_ID) |
+------------------------+---------------+
|                      1 |             7 |
+------------------------+---------------+
1 row in set (0.03 sec)

mysql> SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test1 a JOIN test.test2 b ON
    -> b.a_ID = a.a_ID WHERE a.a_ID = 1;
+------------------------+---------------+
| count(DISTINCT a.a_ID) | count(b.b_ID) |
+------------------------+---------------+
|                      7 |             7 |
+------------------------+---------------+
1 row in set (0.01 sec)

mysql> explain SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test1 a JOIN test.test2 b ON b.a_ID = a.a_ID WHERE a.a_ID = 1;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | a     | const | PRIMARY       | PRIMARY | 4       | const |    1 | Using index |
|  1 | SIMPLE      | b     | ref   | a_id          | a_id    | 5       | const |    7 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

mysql> explain SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test1 a JOIN test.test2 b ON b.a_ID = a.a_ID WHERE a.a_ID < 2;
+----+-------------+-------+-------+---------------+---------+---------+-------------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref         | rows | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+-------------+------+--------------------------+
|  1 | SIMPLE      | a     | index | PRIMARY       | PRIMARY | 4       | NULL        |    5 | Using where; Using index |
|  1 | SIMPLE      | b     | ref   | a_id          | a_id    | 5       | test.a.a_id |    4 | Using where              |
+----+-------------+-------+-------+---------------+---------+---------+-------------+------+--------------------------+
2 rows in set (0.00 sec)

So0, plans for "=1" and "<2" cases are different. Now, let's switch big_tables off:

mysql> set session big_tables=off;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like '%big%';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| big_tables      | OFF   |
| sql_big_selects | ON    |
| sql_big_tables  | OFF   |
+-----------------+-------+
3 rows in set (0.00 sec)

mysql> SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test1 a JOIN test.test2 b ON b.a_ID = a.a_ID WHERE a.a_ID = 1;
+------------------------+---------------+
| count(DISTINCT a.a_ID) | count(b.b_ID) |
+------------------------+---------------+
|                      1 |             7 |
+------------------------+---------------+
1 row in set (0.01 sec)

mysql> explain SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test1 a JOIN test.test2 b ON b.a_ID = a.a_ID WHERE a.a_ID = 1;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | a     | const | PRIMARY       | PRIMARY | 4       | const |    1 | Using index |
|  1 | SIMPLE      | b     | ref   | a_id          | a_id    | 5       | const |    7 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

and we get correct results with the same plan.
[4 Apr 2010 16:51] Valeriy Kravchuk
It seems 5.1.43 is NOT affected:

C:\Program Files\MySQL\MySQL Server 5.1\bin>mysql -uroot -proot -P3310 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 93
Server version: 5.1.43-community-log MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create table test.test1 (a_id integer unsigned auto_increment not null,pr
imary
    -> key(a_id));
Query OK, 0 rows affected (0.63 sec)

mysql> insert into test.test1 (a_id) values (null),(null),(null),(null),(null);
Query OK, 5 rows affected (0.19 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> create table test.test2 (b_id integer unsigned auto_increment not null,pr
imary
    -> key(b_id),a_id integer unsigned,key(a_id));
Query OK, 0 rows affected (0.14 sec)

mysql> insert into test.test2 (b_id,a_id) values
    -> (null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(
null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),
(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5),(null,1)
,(null,2),(null,3),(null,4),(null,5),(null,1),(null,2),(null,3),(null,4),(null,5
);
Query OK, 35 rows affected (0.09 sec)
Records: 35  Duplicates: 0  Warnings: 0

mysql> SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test2 b JOIN test.
test1 a ON
    -> b.a_ID = a.a_ID WHERE a.a_ID < 2;
+------------------------+---------------+
| count(DISTINCT a.a_ID) | count(b.b_ID) |
+------------------------+---------------+
|                      1 |             7 |
+------------------------+---------------+
1 row in set (0.08 sec)

mysql> SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test1 a JOIN test.
test2 b ON
    -> b.a_ID = a.a_ID WHERE a.a_ID = 1;
+------------------------+---------------+
| count(DISTINCT a.a_ID) | count(b.b_ID) |
+------------------------+---------------+
|                      1 |             7 |
+------------------------+---------------+
1 row in set (0.00 sec)

mysql> show session variables like '%big_tables';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| big_tables     | OFF   |
| sql_big_tables | OFF   |
+----------------+-------+
2 rows in set (0.00 sec)

mysql> set session big_tables=ON;
Query OK, 0 rows affected (0.02 sec)

mysql> show session variables like '%big_tables';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| big_tables     | ON    |
| sql_big_tables | ON    |
+----------------+-------+
2 rows in set (0.00 sec)

mysql> SELECT count(DISTINCT a.a_ID), count(b.b_ID) FROM test.test1 a JOIN test.
test2 b ON
    -> b.a_ID = a.a_ID WHERE a.a_ID = 1;
+------------------------+---------------+
| count(DISTINCT a.a_ID) | count(b.b_ID) |
+------------------------+---------------+
|                      1 |             7 |
+------------------------+---------------+
1 row in set (0.00 sec)

so this is recent regression!
[4 Apr 2010 17:02] Dan Kloke
Yes I originally discovered this on 5.1.41, never used big_tables before that.
Then upgraded to 5.1.45 where it also appears.

I forgot you could toggle big_tables for the session. At least that provides an in-place work-around for the short term.
[4 Apr 2010 18:43] MySQL Verification Team
a simpler testcase:

drop table if exists t1;
create table t1 (a int primary key)engine=myisam;
insert into t1 values (1),(2);
set session sql_big_tables=0;
select count(distinct t1.a) from t1,t1 t2 where t1.a = 1;
set session sql_big_tables=1;
select count(distinct t1.a) from t1,t1 t2 where t1.a = 1;
[25 Jan 2011 18:06] MySQL Verification Team
still repeatable in 5.1.54 and 5.5.10.
5.6.2 (mysql-trunk) doesn't have this sql_big_tables variable anymore so the bug doesn't show up there.
[25 Jan 2011 18:14] MySQL Verification Team
I removed the regression tag. Bug is repeatable in 5.1.20, 5.0.91, 5.0.30, 4.1.25.  Remember to disable query cache when testing this bug..
[19 Jun 2013 17:49] Paul DuBois
Noted in 5.6.13, 5.7.2 changelogs.

With big_tables enabled, queries that used COUNT(DISTINCT) on a
simple join with a constant equality condition on a non-duplicate key
returned incorrect results.
[23 May 2014 13:44] Paul DuBois
Noted in 5.6.20, 5.7.5 changelogs.
[6 Aug 2014 17:38] Laurynas Biveinis
$ bzr log -n0 -r 5950
------------------------------------------------------------
revno: 5950
committer: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
branch nick: mysql-5.6
timestamp: Wed 2014-05-21 12:46:14 +0530
message:
  Bug#11760197:INCORRECT RESULTS WITH COUNT DISTINCT AND BIG_TABLES
  Problem:
  When big_tables is enabled, COUNT DISTINCT on simple join with
  constant equality condition gives wrong result.
  
  Analysis:-
  In case of count distinct with sql_big_tables enable, optimizer
  saves data in myisam file instead of heap unique tree.
  When a constant equality condition is present, it does not
  detect duplicate field values when inserting the records into the
  temporary table.
  
  Original solution:
  While creating temporary table, allow creations of fields for constant
  items. When we have distinct query, this will make sure the duplicate
  value filtered out when inserting rows in the temporary table.
  
  Side Effects:
  Bug#17555462 - SELECT DISTINCT OF VIEW WITH CONSTANT STRING
                 RETURNS WRONG RESULT
  Problem:
  In a temporary table if a field is created for a const ref_item,
  and if this ref_item is a reference over a constant string and not
  on a field results are not as expected.
  
  Analysis:
  This is a regression from the patch for Bug#11760197. Post
  this bugfix, a field gets created in temporary table
  even for const_item's. If the const_item is of type Item_ref,
  while creating the temporary field, item->result_field
  is made to point to the newly created field.
  While the original item would hold the constant's value,
  the new field created while changing ref's to use temporary
  fields, will not be having the value.
  As a result we loose the value when fetching the results.
  
  In similar lines:
  Bug #17607155: ASSERTION FAILED: ((ITEM_RESULT_FIELD*)ITEM)->RESULT_FIELD
  Bug #17957913: ASSERTION `((ITEM_RESULT_FIELD*)ITEM)->RESULT_FIELD' FAILS
                 IN CREATE_TMP_FIELD
  Problem:
  Query having a GROUP BY CLAUSE with ROLLUP modifier and a
  GROUP_CONCAT/COUNT(DISTINCT) function with constant expression,
  causes an assert.
  
  Analysis:
  The GROUP_CONCAT/COUNT(DISTINCT) uses its own temporary table.
  When ROLLUP is present it creates the second copy of temporary
  table.
  This copy receives the same list of arguments that original
  group_concat/count(distinct) does which will help to copy
  the content of the result_field for the function under
  GROUP_CONCAT/COUNT from  the first temporary table to the second
  temporary table.
  
  In the case, when constant item is present, result_field will carry
  null value. After the fix for Bug#11760197, while creating field
  for second temporary table as result_field for the constant
  expression is not set, it asserts.
  
  Bug#17634335: SELECT DISTINCT...GROUP BY RETURNS WRONG RESULTS
  IN SOME CASES
  
  Query creating temporary table to find the distinct value and has
  constant value in projected list doesn't give correct result.
  
  Analysis:
  After the fix for Bug#11760197 const_items also were created as
  part of temporary tables. In the call to remove_duplicates() an
  assumption against the same was made which resulted in the above bug.
  
  All the above bugs were side effects of the fix made for Bug#11760197.
  
  Current solution:
  Distinct of a constant value will always be the constant value and
  count distinct of the same will always be one. Based on this,
  a new variable const_distinct is introduced. If enabled, temporary
  table is not created and aggregation is also avoided as the result
  will always be one.
  
  Works in similar lines to always_null.
[6 Aug 2014 17:43] Laurynas Biveinis
See bug 70657, bug 71149
[6 Aug 2014 17:47] Laurynas Biveinis
$ bzr log -n0 -r 5953
------------------------------------------------------------
revno: 5953
committer: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
branch nick: mysql-5.6
timestamp: Wed 2014-05-21 21:51:16 +0530
message:
  Post push fix for Bug#11760197