Description:
At present, MySQL is only able to utilise indicies to accelerate the completion of MIN and MAX aggregates over a single table, as referenced in http://dev.mysql.com/doc/refman/5.1/en/loose-index-scan.html.
At present, I am doing some reporting and DB performance evaluations for a client involving a table containing 106807643 rows, with the following schema:
CREATE TABLE `acct_full` (
`agent_id` int(2) unsigned NOT NULL,
`mac_src` char(17) NOT NULL,
`mac_dst` char(17) NOT NULL,
`vlan` int(2) unsigned NOT NULL,
`ip_src` char(15) NOT NULL,
`ip_dst` char(15) NOT NULL,
`src_port` int(2) unsigned NOT NULL,
`dst_port` int(2) unsigned NOT NULL,
`ip_proto` char(6) NOT NULL,
`tos` int(4) unsigned NOT NULL,
`packets` int(10) unsigned NOT NULL,
`bytes` bigint(20) unsigned NOT NULL,
`flows` int(10) unsigned NOT NULL,
`stamp_inserted` datetime NOT NULL,
`stamp_updated` datetime DEFAULT NULL,
KEY `stamp_inserted` (`stamp_inserted`,`ip_dst`,`dst_port`,`ip_src`,`src_port`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
For my purposes, I'm looking to roll up some data so that the key referenced above can be a primary key. To do this, I need a list of duplicate values for the fields referenced in the key as produced by the following query:
create table repeated_values select `stamp_inserted`,`ip_dst`,`dst_port`,`ip_src` from acct_full group by `stamp_inserted`,`ip_dst`,`dst_port`,`ip_src`,`src_port` having count(*) > 1;
Although the index contains all required information to evaluate the result set quickly, MySQL does not currently do so, resorting to a full index scan:
mysql> explain select `stamp_inserted`,`ip_dst`,`dst_port`,`ip_src`,`src_port` from acct_full group by `stamp_inserted`,`ip_dst`,`dst_port`,`ip_src`,`src_port` having count(*) > 1;
+----+-------------+-----------+-------+---------------+----------------+---------+------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+----------------+---------+------+-----------+-------------+
| 1 | SIMPLE | acct_full | index | NULL | stamp_inserted | 46 | NULL | 106807643 | Using index |
+----+-------------+-----------+-------+---------------+----------------+---------+------+-----------+-------------+
1 row in set (0.01 sec)
The same result is observed by either including a smaller left-subset of the key for grouping (which should result in less of the index needing to be visited) or extending the key to include more fields.
How to repeat:
Create table as above, populate with some data and execute included queries.
Suggested fix:
Optimise index usage when evaluating COUNT aggregates.