Bug #62495 Discrepant results when using UNION and COUNT
Submitted: 22 Sep 2011 0:38 Modified: 22 Sep 2011 11:18
Reporter: Danny Kopping Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:5.1.44 OS:MacOS (10.6.8)
Assigned to: CPU Architecture:Any
Tags: count, distinct, UNION

[22 Sep 2011 0:38] Danny Kopping
Description:
It may just be my ignorance, but I think I'm getting discrepant results when using COUNT and UNION

How to repeat:

Take the following query:
SELECT userId AS total
FROM (
SELECT COUNT( fum.userId ) AS userId
FROM FilterUserMatch fum
WHERE fum.filterId
IN ( 1, 2, 3 )
UNION
SELECT COUNT( p.purchasedUserId ) AS userId
FROM Purchase p
WHERE p.filterId
IN ( 1, 2, 3 )
) AS countALL

This returns two records: 9 & 6

However, when I don't use the COUNT() function on each sub-query, but instead put it on the main query:

SELECT COUNT(userId) AS total
FROM (
SELECT fum.userId AS userId
FROM FilterUserMatch fum
WHERE fum.filterId
IN ( 1, 2, 3 )
UNION
SELECT p.purchasedUserId AS userId
FROM Purchase p
WHERE p.filterId
IN ( 1, 2, 3 )
) AS countALL

I get the result: 14

1st query: 9 + 6 = 15
2nd query: 14

The data has not changed between running these two queries.
[22 Sep 2011 0:41] Danny Kopping
I should mention that the data I was using had one duplicate - which the UNION is supposed to take care of - in my understanding.
[22 Sep 2011 7:20] Valeriy Kravchuk
Please, read the manual, http://dev.mysql.com/doc/refman/5.1/en/union.html:

"The default behavior for UNION is that duplicate rows are removed from the result. The optional DISTINCT keyword has no effect other than the default because it also specifies duplicate-row removal. With the optional ALL keyword, duplicate-row removal does not occur and the result includes all matching rows from all the SELECT statements."

So, in your second query UNION removed duplicated userid. Try UNION ALL and you'll see 16 as a result.

This is not a bug.
[22 Sep 2011 10:40] Danny Kopping
OK, I get that but the correct number of rows - when using COUNT on the UNION results - should be 14, not 15... And when I simply put the COUNT in a different place, in theory it should return the same number, but it doesn't; that's why I think this is possibly a bug.
[22 Sep 2011 10:54] Valeriy Kravchuk
UNION leaves only one of duplicate values. Look:

mysql> create table tu(c1 int);
Query OK, 0 rows affected (0.47 sec)

mysql> insert into tu values(1),(2),(3),(4);
Query OK, 4 rows affected (0.13 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select c1 from tu where c1 in (1,2,3) union select c1 from tu where c1 in
 (3,4);
+------+
| c1   |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
+------+
4 rows in set (0.13 sec)

mysql> select count(c1) as cc1 from tu where c1 in (1,2,3) union select count(c1
) from tu where c1 in (3,4);
+-----+
| cc1 |
+-----+
|   3 |
|   2 |
+-----+
2 rows in set (0.23 sec)

mysql> select count(c1) from (select c1 from tu where c1 in (1,2,3) union select
 c1 from tu where c1 in (3,4)) as t;
+-----------+
| count(c1) |
+-----------+
|         4 |
+-----------+
1 row in set (0.38 sec)

So, 3 values in first select, 2 in second, but one of them is present in both, so UNION gives just 4 different values.

Now, UNION ALL:

mysql> select c1 from tu where c1 in (1,2,3) union all select c1 from tu where c
1 in (3,4);
+------+
| c1   |
+------+
|    1 |
|    2 |
|    3 |
|    3 |
|    4 |
+------+
5 rows in set (0.00 sec)

mysql> select count(c1) from (select c1 from tu where c1 in (1,2,3) union all se
lect c1 from tu where c1 in (3,4)) as t;
+-----------+
| count(c1) |
+-----------+
|         5 |
+-----------+
1 row in set (0.00 sec)

I hope it's clear now.
[22 Sep 2011 11:18] Danny Kopping
Ah, I see - thanks Valeriy!