Bug #17865 SELECT DISTINCT x quicker that SELECT COUNT (DISTINCT x)
Submitted: 2 Mar 2006 15:25 Modified: 24 Jan 2014 9:42
Reporter: Stephen Harris Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.0.18 OS:Any (Fedore Core 5 RC3)
Assigned to: CPU Architecture:Any

[2 Mar 2006 15:25] Stephen Harris
Description:
On a table with 4.8 million rows, count(distinct col) uses a different (and slower) optimisation route to the plain query.

The m column is a bigint with a plain index on it.

mysql> select count(distinct m) from s0000s;
+-------------------+
| count(distinct m) |
+-------------------+
|               665 |
+-------------------+
1 row in set (5.59 sec)

select distinct m from s0000s;
...
665 rows in set (0.01 sec)

How to repeat:
Create table with indexed integer column, try the different queries.

Suggested fix:
Make count(distinct) use the same optimisation method as select distinct.
[2 Mar 2006 15:28] Valeriy Kravchuk
Thank you for a problem report. Please, send the results of

explain select count(distinct m) from s0000s;
explain select distinct m from s0000s;

What if you restart the server and try to execute the second statement first?
[2 Mar 2006 15:50] Stephen Harris
mysql> explain select count(distinct m) from s0000s;
+----+-------------+--------+-------+---------------+------+---------+------+---------+-------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+--------+-------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | s0000s | index | NULL          | m    | 8       | NULL | 4804208 | Using index |
+----+-------------+--------+-------+---------------+------+---------+------+---------+-------------+

mysql> explain select distinct m from s0000s;
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | s0000s | range | NULL          | m    | 8       | NULL |  662 | Using index for group-by |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+

I tried restarting the server and running the queries in different orders, but it makes little difference.
[2 Mar 2006 15:52] Stephen Harris
Thanks for the quick response my the way, very impressive!

I am working round this problem by SELECTing then storing the reuslt and using mysql_num_rows(), but it would be good to have a fix for the future.
[2 Mar 2006 15:54] Valeriy Kravchuk
OK, it was just my crasy idea... Please, send the SHOW CREATE TABLE results for the table used in your query.
[2 Mar 2006 16:07] jyrgen hall
I found a related(?) problem with 4.1.18
on RHEL3

> SELECT COUNT(*) FROM (SELECT DISTINCT filehash FROM t1) AS t_sub;
+----------+
| COUNT(*) |
+----------+
| 591298 |
+----------+
1 row in set (1.41 sec) !!! moderately fast !!!

SELECT count(DISTINCT filehash) FROM FILES;
+--------------------------+
| count(DISTINCT filehash) |
+--------------------------+
| 591298 |
+--------------------------+
1 row in set (21.59 sec) !!! too slow !!!

more info in this thread

http://forums.mysql.com/read.php?10,73048

regards, jürgen
[2 Mar 2006 17:35] Stephen Harris
The CREATE COMMAND was:

CREATE TABLE `s0000s` (
  `m` bigint(20) NOT NULL default '0',
  `s` bigint(20) NOT NULL default '0',
  `p` bigint(20) NOT NULL default '0',
  `o` bigint(20) NOT NULL default '0',
  KEY `m` (`m`),
  KEY `sp` (`s`,`p`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
[2 Mar 2006 17:41] Stephen Harris
Yes, that does seem to be the same problem jyrgen.

Incase it matters the hardware is a dual xeon with 4gb of ram and sata disks. I have 4 other machines (running the same OS, DB version, schema and similar table sizes), and they all show the same behaviour.
[2 Apr 2006 14:16] Valeriy Kravchuk
Verified on 5.0.21-BK (ChangeSet@1.2131, 2006-04-01 05:53:37+02:00) with your table and following sample data:

mysql> select distinct m from s0000s;
...

| 666 |
+-----+
666 rows in set (0.05 sec)

mysql> select count(distinct m) from s0000s;
+-------------------+
| count(distinct m) |
+-------------------+
| 666               |
+-------------------+
1 row in set (0.94 sec)

So yes, there is a difference in execution time. Data was loadied by the following shell script:

a=0
while [ $a -le 665 ]; do   let a=a+1;   bin/mysql -uroot test -e "insert into s0000s(m) values ($a)"; done

and then:

insert into s0000s select * from s0000s;

several times to get a reasonable number of rows:

mysql> select count(*) from s0000s;
+----------+
| count(*) |
+----------+
| 681984   |
+----------+
1 row in set (0.00 sec)

EXPLAIN gives different results:

mysql> explain extended select distinct m from s0000s;
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
| 1  | SIMPLE      | s0000s | range |               | m    | 8       |      | 10   | Using index for group-by |
+----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select distinct `test`.`s0000s`.`m` AS `m` from `test`.`s0000s`
1 row in set (0.00 sec)

mysql> explain extended select count(distinct m) from s0000s;
+----+-------------+--------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+-------+---------------+------+---------+------+--------+-------------+
| 1  | SIMPLE      | s0000s | index |               | m    | 8       |      | 681984 | Using index |
+----+-------------+--------+-------+---------------+------+---------+------+--------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select count(distinct `test`.`s0000s`.`m`) AS `count(distinct m)` from
`test`.`s0000s`
1 row in set (0.00 sec)

Number of rows clearly explain the difference in execution times. It is a bug (or a feature request) for the optimizer. It should NOT scan the entire index - better even is to put distinct data into a temp table and then count!
[4 Apr 2006 22:07] Igor Babaev
This sproblem will be fixed in 5.2. See WL #3220 "Loose index scan for COUNT DISTINCT" on this account.
[24 Jan 2014 9:42] Manyi Lu
It seems this bug is fixed in 5.6.