Bug #28691 | GROUP BY optimization ignores functional dependency on unique/primary key | ||
---|---|---|---|
Submitted: | 25 May 2007 20:16 | Modified: | 23 Aug 2007 16:42 |
Reporter: | Roland Bouman | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.1.18 | OS: | Any |
Assigned to: | Martin Hansson | CPU Architecture: | Any |
Tags: | functional dependency, GROUP BY |
[25 May 2007 20:16]
Roland Bouman
[30 May 2007 9:10]
Roland Bouman
Simplified test case: explain SELECT f.film_id , f.title , max(i.inventory_id) FROM film f LEFT JOIN inventory i ON f.film_id = i.film_id GROUP BY f.film_id , f.title
[30 May 2007 9:16]
Roland Bouman
At request of timour and joro, table definitions: CREATE TABLE `film` ( `film_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT, `title` varchar(255) NOT NULL, `description` text, `release_year` year(4) DEFAULT NULL, `language_id` tinyint(3) unsigned NOT NULL, `original_language_id` tinyint(3) unsigned DEFAULT NULL, `rental_duration` tinyint(3) unsigned NOT NULL DEFAULT '3', `rental_rate` decimal(4,2) NOT NULL DEFAULT '4.99', `length` smallint(5) unsigned DEFAULT NULL, `replacement_cost` decimal(5,2) NOT NULL DEFAULT '19.99', `rating` enum('G','PG','PG-13','R','NC-17') DEFAULT 'G', `special_features` set('Trailers','Commentaries','Deleted Scenes','Behind the Scenes') DEFAULT NULL, `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`film_id`), KEY `idx_fk_language_id` (`language_id`), KEY `idx_fk_original_language_id` (`original_language_id`), KEY `idx_title` (`title`), CONSTRAINT `fk_film_language` FOREIGN KEY (`language_id`) REFERENCES `language` (`language_id`) ON UPDATE CASCADE, CONSTRAINT `fk_film_language_original` FOREIGN KEY (`original_language_id`) REFERENCES `language` (`language_id`) ON UPDATE CASCADE ) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8 CREATE TABLE `inventory` ( `inventory_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT, `film_id` smallint(5) unsigned NOT NULL, `store_id` tinyint(3) unsigned NOT NULL, `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`inventory_id`), KEY `idx_fk_film_id` (`film_id`), KEY `idx_store_id_film_id` (`store_id`,`film_id`), CONSTRAINT `fk_inventory_film` FOREIGN KEY (`film_id`) REFERENCES `film` (`film_id`) ON UPDATE CASCADE, CONSTRAINT `fk_inventory_store` FOREIGN KEY (`store_id`) REFERENCES `store` (`store_id`) ON UPDATE CASCADE ) ENGINE=InnoDB AUTO_INCREMENT=4582 DEFAULT CHARSET=utf8 Note that the sakila database may be downloaded here: http://downloads.mysql.com/docs/sakila-db.tar.gz
[7 Aug 2007 10:33]
Martin Hansson
This was not a trivial problem. In fact, it consists of several problematic issues: 1) The fix for Bug#16458 introduced a bug. I give two examples. This bug displays itself for InnoDB in the following way: DROP TABLE IF EXISTS t1; CREATE TABLE t1( a INT NOT NULL PRIMARY KEY, b INT NOT NULL, KEY( b ) ) engine=innodb; INSERT INTO t1 VALUES ( 1, 1 ); INSERT INTO t1 VALUES ( 2, 3 ); INSERT INTO t1 VALUES ( 3, 2 ); SELECT t1.a, b FROM t1 GROUP BY t1.a, t1.b; -- +---+---+ -- | a | b | -- +---+---+ -- | 1 | 1 | -- | 3 | 2 | -- | 2 | 3 | -- +---+---+ -- The result is wrong and EXPLAIN tells us why (the wrong key is used): EXPLAIN SELECT t1.a, b FROM t1 GROUP BY t1.a, t1.b\g -- *************************** 1. row *************************** -- id: 1 -- select_type: SIMPLE -- table: t1 -- type: index -- possible_keys: NULL -- key: b -- key_len: 4 -- ref: NULL -- rows: 3 -- Extra: Using index But the bug is visible in MyISAM, too: CREATE TABLE t2( a INT, b INT NOT NULL, c INT NOT NULL, d INT, PRIMARY KEY (a,b), UNIQUE KEY (c,b) ); INSERT INTO t2 VALUES (1,1,1,50), (1,2,3,40), (2,1,3,4); EXPLAIN SELECT DISTINCT a,b,d FROM t2 GROUP BY c,b,d\G -- *************************** 1. row *************************** -- id: 1 -- select_type: SIMPLE -- table: t2 -- type: ALL -- possible_keys: NULL -- key: NULL -- key_len: NULL -- ref: NULL -- rows: 3 -- Extra: Here no key is used and the GROUP BY removed. Obviously, this plan is unrealistic, and accordingly the result is wrong: SELECT * FROM t2; -- +---+---+---+------+ -- | a | b | c | d | -- +---+---+---+------+ -- | 1 | 1 | 1 | 50 | -- | 1 | 2 | 3 | 40 | -- | 2 | 1 | 3 | 4 | -- +---+---+---+------+ The central idea of the fix for bug#16458 is valid but needs to used differently. It is based on the following assumption: Assumption 16458: If ORDER BY or GROUP BY matches all keyparts of a unique key, we can skip ORDER/GROUP BY. I will add to this: "...as long as that key is used for lookup" 2) The decision above should be placed within test_if_skip_sort_order instead of before it, since in there we have the code to actually use an index. 3) We also have to fix test_if_order_by_key since it doesn't take Assumption 16458 above into account either. 4) MyISAM and InnoDB have opposite attitudes towards index scan. ha_innodb::keys_to_use_for_scanning() is happy to return the set of all indexes, even such that don't exist. ha_myisam::keys_to_use_for_scanning() on the other hand invariably returns the empty set of keys. The problem here is that test_if_skip_sort_order listens too much to what MyISAM says in this matter (i.e. don't use any keys for scanning). It should use whatever index it finds appropriate. Note that this function alone is not responsible for hiding the problem in #1, but probably MyISAM gives a lower cost for table scan than index scan. 5) Primary keys in temporary tables, heap tables and tables from federated handlers are not supposed to get discovered by test_if_skip_sort_order, if they are the handler will core dump when instructed to use them. We will remove from consideration all keys from these tables.
[7 Aug 2007 16:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/32209 ChangeSet@1.2485, 2007-08-07 18:51:43+02:00, mhansson@linux-st28.site +15 -0 Bug #28691: GROUP BY optimization ignores functional dependency on unique/primary key In the general case, MySQL did not exploit ordered keys in order to skip GROUP BY and ORDER by, and when it did results were sometimes sorted wrong. The problem consisted of: 1) GROUP BY was removed without choosing the ordered index 2) test_if_skip_sort_order and test_if_order_by_key failed to take into account that a full match on one unique ordered key is enough to skip GROUP BY once (1) is fixed. (some limitations apply, see comment for sql_select.cc) 3) Some storage engines (e.g. MyISAM) may not recommend index scan over table scan, and test_if_skip_sort_order did not consider 'unrecommended' keys. Fixed by 1) moving the skipping of GROUP BY to the place where an index is chosen, so that not one is done without the other, 2) Adding to test_if_skip_sort_order and test_if_order_by_key the ability to recognize good indexes, and 3) by not paying attention to what the storage engine thinks is good use of indexes in this case.
[22 Aug 2007 10:54]
Timour Katchaounov
Increased priority to P2 as the bug results in wrong result (incorrect result order).
[23 Aug 2007 15:15]
Martin Hansson
The problem with wrong result order is a different bug: Bug#30596. Therefore, I change the priority back to low.
[23 Aug 2007 15:24]
Martin Hansson
According to Georgi Kodinov, he had a discussion with Igor Babaev, concluding that GROUP BY should not be computed using indexes in multi-table selects. This is obvious unless all tables have unique constraints on the joined field(s) *and* on the grouped fields. But for more than two tables, this gets pretty complex.
[23 Aug 2007 16:42]
Timour Katchaounov
According to the previous comment from Martin, this is not a bug. In the course of analyzing this bug Martin discovered another related problem, which will be fixed as BUG#30596.