Bug #15198 | Dropping then adding an index changes group by optimization behaviour | ||
---|---|---|---|
Submitted: | 23 Nov 2005 19:58 | Modified: | 24 Mar 2006 7:52 |
Reporter: | Jaime Pinheiro | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.0.16-nt | OS: | Windows (Windows 2000) |
Assigned to: | Georgi Kodinov | CPU Architecture: | Any |
[23 Nov 2005 19:58]
Jaime Pinheiro
[23 Nov 2005 22:27]
Heikki Tuuri
Hi! This may be due to the fact that MySQL updates index statistics in the ALTER. But clearly the GROUP BY should always use the index, regardless of statistics. This is probably an optimizer bug. Regards, Heikki
[23 Nov 2005 23:22]
Jorge del Conde
Thanks for your bug report. I was able to reproduce this under FC4 using a fresh 5.0 pull
[23 Mar 2006 3:10]
Björn Steinbrink
While playing with the stuff in this [1] thread in the optimizer forum, I've noticed the same problem in a slightly different context, but I guess it belongs into this bug report. In a subquery, the optimizer seems to never choose "Using index for group-by" but only "Using index", which slows down the optimizer a lot, while the query itself is as fast as expected (ie. time for the query and for the "explain" are the same). How to reproduce: mysql> create table foo (a int, index a (a)); Query OK, 0 rows affected (0.01 sec) mysql> insert into foo values (5),(10),(15),(15),(15),(15),(15),(15),(15),(15),(15),(15),(15),(15),(15),(15),(20),(20),(25); Query OK, 19 rows affected (0.00 sec) Records: 19 Duplicates: 0 Warnings: 0 mysql> explain select distinct a from foo; +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ | 1 | SIMPLE | foo | range | NULL | a | 5 | NULL | 10 | Using index for group-by | +----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+ 1 row in set (0.00 sec) mysql> explain select * from (select distinct a from foo) bar; +----+-------------+------------+-------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+------+---------+------+------+-------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 5 | | | 2 | DERIVED | foo | index | NULL | a | 5 | NULL | 19 | Using index | +----+-------------+------------+-------+---------------+------+---------+------+------+-------------+ 2 rows in set (0.00 sec) If the table becomes large, the second explain will take quite some time (over 14 seconds here for a 1GB test table with some additional fields). This subquery in particular is a quite interesting thing, as you can use it to "simulate" Oracle's index skip scan, if it works fast (see forum thread for some details on that). While writing this comment, I noticed one thing being even worse: mysql> explain select distinct a from foo limit 3; +----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------+ | 1 | SIMPLE | foo | index | NULL | a | 5 | NULL | 10 | Using index; Using temporary | +----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------+ 1 row in set (0.00 sec) I guess this could also use "Using index for group-by". [1] http://forums.mysql.com/read.php?115,74686,77681#msg-77681
[23 Mar 2006 16:29]
Georgi Kodinov
The problem looks like it is related to InnoDB statistics. I have performed several tests and have (somewhat) extended the original sql script. First of all Heikki is right in saying that the "Using index for group-by" method is applicable and it's indeed tried in both cases, but it is ruled out because of difference in the statistics values coming from InnoDB. I've compared the statistics that are passed to the cost_group_min_max() function (sql/opt_range.cc:7755), namely index_info->rec_per_key[group_key_parts - 1]. For the first select (the one without the dropped/recreated index) it is 1 whereas for the second select it is 2. This makes huge difference in the cost based on the fact that the record count (table->file->records) is 4 both times. Note that if I restart the server and just reissue the selects I get the same number (2) for index_info->rec_per_key[group_key_parts - 1] in both cases. I have then added some more rows to the table by doing : delimiter | create procedure a(start int, nrecs int) begin while (nrecs > 0) do INSERT INTO t1 VALUES (start, '2003-10-21 11:09:35', '2004-10-01', '08:15:40', 8, '68987', 'TTTTT', '10.0.0.1', '11467', 'XXXXX', NULL, NULL, 'sample text', NULL, 0, 9216); set nrecs = nrecs - 1; set start = start + 1; end while; end| delimiter ; call a(333937, 10000); instead of the four inserts that are done in the original report to get a better readout. Now I get index_info->rec_per_key[group_key_parts - 1] = 1 and table->file->records = 9540 in the wrong case and index_info->rec_per_key[group_key_parts - 1] = 5118 and table->file->records = 10237 in the right case. I also get two cases compile identically ("Using index for group-by") when I restart the server. Then I've changed the storage engine for the table from InnoDB to MyISAM and the two selects compiled identically.
[24 Mar 2006 7:52]
Georgi Kodinov
Instead of droping and recreating the index I've now tried ANALYZE TABLE test_groupby This also fixes the problem. Please see the section 13.5.2.1. of the MySQL Reference manual (http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html) where this command is documented.