Bug #17865 | SELECT DISTINCT x quicker that SELECT COUNT (DISTINCT x) | ||
---|---|---|---|
Submitted: | 2 Mar 2006 15:25 | Modified: | 24 Jan 2014 9:42 |
Reporter: | Stephen Harris | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.0.18 | OS: | Any (Fedore Core 5 RC3) |
Assigned to: | CPU Architecture: | Any |
[2 Mar 2006 15:25]
Stephen Harris
[2 Mar 2006 15:28]
Valeriy Kravchuk
Thank you for a problem report. Please, send the results of explain select count(distinct m) from s0000s; explain select distinct m from s0000s; What if you restart the server and try to execute the second statement first?
[2 Mar 2006 15:50]
Stephen Harris
mysql> explain select count(distinct m) from s0000s; +----+-------------+--------+-------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+-------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | s0000s | index | NULL | m | 8 | NULL | 4804208 | Using index | +----+-------------+--------+-------+---------------+------+---------+------+---------+-------------+ mysql> explain select distinct m from s0000s; +----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+ | 1 | SIMPLE | s0000s | range | NULL | m | 8 | NULL | 662 | Using index for group-by | +----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+ I tried restarting the server and running the queries in different orders, but it makes little difference.
[2 Mar 2006 15:52]
Stephen Harris
Thanks for the quick response my the way, very impressive! I am working round this problem by SELECTing then storing the reuslt and using mysql_num_rows(), but it would be good to have a fix for the future.
[2 Mar 2006 15:54]
Valeriy Kravchuk
OK, it was just my crasy idea... Please, send the SHOW CREATE TABLE results for the table used in your query.
[2 Mar 2006 16:07]
jyrgen hall
I found a related(?) problem with 4.1.18 on RHEL3 > SELECT COUNT(*) FROM (SELECT DISTINCT filehash FROM t1) AS t_sub; +----------+ | COUNT(*) | +----------+ | 591298 | +----------+ 1 row in set (1.41 sec) !!! moderately fast !!! SELECT count(DISTINCT filehash) FROM FILES; +--------------------------+ | count(DISTINCT filehash) | +--------------------------+ | 591298 | +--------------------------+ 1 row in set (21.59 sec) !!! too slow !!! more info in this thread http://forums.mysql.com/read.php?10,73048 regards, jürgen
[2 Mar 2006 17:35]
Stephen Harris
The CREATE COMMAND was: CREATE TABLE `s0000s` ( `m` bigint(20) NOT NULL default '0', `s` bigint(20) NOT NULL default '0', `p` bigint(20) NOT NULL default '0', `o` bigint(20) NOT NULL default '0', KEY `m` (`m`), KEY `sp` (`s`,`p`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
[2 Mar 2006 17:41]
Stephen Harris
Yes, that does seem to be the same problem jyrgen. Incase it matters the hardware is a dual xeon with 4gb of ram and sata disks. I have 4 other machines (running the same OS, DB version, schema and similar table sizes), and they all show the same behaviour.
[2 Apr 2006 14:16]
Valeriy Kravchuk
Verified on 5.0.21-BK (ChangeSet@1.2131, 2006-04-01 05:53:37+02:00) with your table and following sample data: mysql> select distinct m from s0000s; ... | 666 | +-----+ 666 rows in set (0.05 sec) mysql> select count(distinct m) from s0000s; +-------------------+ | count(distinct m) | +-------------------+ | 666 | +-------------------+ 1 row in set (0.94 sec) So yes, there is a difference in execution time. Data was loadied by the following shell script: a=0 while [ $a -le 665 ]; do let a=a+1; bin/mysql -uroot test -e "insert into s0000s(m) values ($a)"; done and then: insert into s0000s select * from s0000s; several times to get a reasonable number of rows: mysql> select count(*) from s0000s; +----------+ | count(*) | +----------+ | 681984 | +----------+ 1 row in set (0.00 sec) EXPLAIN gives different results: mysql> explain extended select distinct m from s0000s; +----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+ | 1 | SIMPLE | s0000s | range | | m | 8 | | 10 | Using index for group-by | +----+-------------+--------+-------+---------------+------+---------+------+------+--------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> show warnings\G *************************** 1. row *************************** Level: Note Code: 1003 Message: select distinct `test`.`s0000s`.`m` AS `m` from `test`.`s0000s` 1 row in set (0.00 sec) mysql> explain extended select count(distinct m) from s0000s; +----+-------------+--------+-------+---------------+------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+-------+---------------+------+---------+------+--------+-------------+ | 1 | SIMPLE | s0000s | index | | m | 8 | | 681984 | Using index | +----+-------------+--------+-------+---------------+------+---------+------+--------+-------------+ 1 row in set, 1 warning (0.00 sec) mysql> show warnings\G *************************** 1. row *************************** Level: Note Code: 1003 Message: select count(distinct `test`.`s0000s`.`m`) AS `count(distinct m)` from `test`.`s0000s` 1 row in set (0.00 sec) Number of rows clearly explain the difference in execution times. It is a bug (or a feature request) for the optimizer. It should NOT scan the entire index - better even is to put distinct data into a temp table and then count!
[4 Apr 2006 22:07]
Igor Babaev
This sproblem will be fixed in 5.2. See WL #3220 "Loose index scan for COUNT DISTINCT" on this account.
[24 Jan 2014 9:42]
Manyi Lu
It seems this bug is fixed in 5.6.