Bug #30283 | Strange behavior with COUNT(DISTINCT()) | ||
---|---|---|---|
Submitted: | 7 Aug 2007 19:28 | Modified: | 8 Aug 2007 7:13 |
Reporter: | Ben Vaughn | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S4 (Feature request) |
Version: | 5.1.20, 5.0.46 | OS: | Any (Mac OS X Server 10.3.9) |
Assigned to: | CPU Architecture: | Any | |
Tags: | select distinct slow speed table scan |
[7 Aug 2007 19:28]
Ben Vaughn
[8 Aug 2007 6:25]
Valeriy Kravchuk
Thank you for a problem report. Access pathes you reported verified just as described with a simple test case: create table tdis(id int auto_increment primary key, d date); insert into tdis(d) values (now()); insert into tdis(d) select date_sub(now(), interval 1000*rand() day) from tdis; ... insert into tdis(d) select date_sub(now(), interval 1000*rand() day) from tdis; alter table tdis add key i_date(d); You have to insert many thousands of rows that way, with <= 1000 distinct d values. "index" access path (full index scan) is used both for InnoDB and MyISAM tables: mysql> explain select distinct(d) from tdis; +----+-------------+-------+-------+---------------+--------+---------+------+-- ----+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | r ows | Extra | +----+-------------+-------+-------+---------------+--------+---------+------+-- ----+--------------------------+ | 1 | SIMPLE | tdis | range | NULL | i_date | 4 | NULL | 1 025 | Using index for group-by | +----+-------------+-------+-------+---------------+--------+---------+------+-- ----+--------------------------+ 1 row in set (0.00 sec) mysql> explain select count(distinct d) from tdis; +----+-------------+-------+-------+---------------+--------+---------+------+-- ----+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | r ows | Extra | +----+-------------+-------+-------+---------------+--------+---------+------+-- ----+-------------+ | 1 | SIMPLE | tdis | index | NULL | i_date | 4 | NULL | 2 048 | Using index | +----+-------------+-------+-------+---------------+--------+---------+------+-- ----+-------------+ 1 row in set (0.00 sec) This is at least a reasonable feature request, if special optimization for COUNT(DISTINCT col) is not there by design.
[8 Aug 2007 7:13]
Igor Babaev
From any point of view this is a feature request. This optimization won't be implemented in 5.0/5.1 in any case.