MySQL Bugs: #1346: Finding Duplicates with GROUP BY

Bug #1346	Finding Duplicates with GROUP BY
Submitted:	19 Sep 2003 5:03	Modified:	19 Sep 2003 7:45
Reporter:	[ name withheld ]	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Command-line Clients	Severity:	S3 (Non-critical)
Version:	4.0.12	OS:	Windows (Windows 2000)
Assigned to:		CPU Architecture:	Any

Description:
Ok.  I looked through google and saw a couple examples of how to find the duplicates in a table that looked like this...

SELECT (id) as n from emp_table GROUP BY id
HAVING n > 1;

Ok.  So I tried it.  I have a table of 13,128,178 rows (not a small example).  I ran a DISTINCT on the key field and the result was 12,787,768 so that tells me there are 340,410 duplicate rows.

Ok.  I run the above command.  I get 272,626 as an answer.  What?  That doesn't match.  For grins I run the query again with.

HAVING n > 2;

This time I get 67,756.
Again HAVING n > 3 I get 30.

Add them all up I get 340,413.  Now they match!  What seems to be happening is that the HAVING n > 1 is really doing n == 2 not GREATER than 1 and so on.

I would say this looks like a bug or I am crazy either of which could be true.

How to repeat:
Load a table with a few duplicates and triplicates and run the group by query.

Suggested fix:
What seems to be happening is that the HAVING n > 1 is really doing n == 2 not GREATER than 1 and so on.

The correct way to look for rows that have duplicates is:
SELECT id, count(*) as n from emp_table GROUP BY id HAVING n > 1;

SELECT id, count(*) as n from emp_table GROUP BY id HAVING n > 1;

I tried that way also.  Same problem.