Bug #26106 | Wrong plan may be chosen when there are several possible range and ref accesses | ||
---|---|---|---|
Submitted: | 6 Feb 2007 8:15 | Modified: | 13 Aug 2010 2:31 |
Reporter: | Sergey Petrunya | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.0.33 | OS: | Any |
Assigned to: | Martin Hansson | CPU Architecture: | Any |
[6 Feb 2007 8:15]
Sergey Petrunya
[6 Feb 2007 8:18]
Sergey Petrunya
How to repeat: 1. Create and fill the table: CREATE TABLE `link_in85` ( `domain_id` int(10) unsigned NOT NULL, `link_id` int(10) unsigned NOT NULL auto_increment, `url_from` varchar(255) NOT NULL, `url_to` varchar(255) NOT NULL, `anchor` varchar(255) NOT NULL, `from_site_id` int(10) unsigned NOT NULL, `from_forum_id` int(10) unsigned NOT NULL, `from_message_id` int(10) unsigned NOT NULL, `message_published` timestamp NOT NULL default CURRENT_TIMESTAMP, `kind` enum('link','img') NOT NULL, `url_title` varchar(255) NOT NULL, `isexternal` tinyint(3) unsigned NOT NULL, `revert_domain` varchar(255) NOT NULL, `url_prefix` varchar(255) NOT NULL, `from_domain_id` int(10) unsigned NOT NULL, `ext` varchar(25) NOT NULL, `linktype` enum('html','video','mp3','image','pdf','other') NOT NULL, `message_day` date NOT NULL, PRIMARY KEY (`link_id`), UNIQUE KEY `domain_id_2` (`domain_id`,`link_id`), KEY `url_to` (`url_to`), KEY `domain_id` (`domain_id`,`from_site_id`,`message_published`), KEY `revert_domain` (`revert_domain`,`url_prefix`(80)), KEY `domain_id_3` (`domain_id`,`message_day`,`revert_domain`,`url_prefix`,`from_site_id`,`isexternal`), KEY `domain_message` (`domain_id`,`message_day`,`isexternal`,`revert_domain`,`url_prefix`,`from_site_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 fill it with certain data (you need precise dataset from PeterZ, I have it)
[6 Feb 2007 8:20]
Sergey Petrunya
Problem #1: Run the query: explain SELECT url_to, count(*) cnt, revert_domain, url_prefix, count(distinct from_site_id) uniq FROM link_in85 IGNORE INDEX (domain_message) WHERE domain_id=434876 AND kind='link' AND isexternal=1 AND message_day IN ('2007-01-01','2007-01-02','2007-01-03','2007-01-04', '2007-01-05','2007-01-06','2007-01-07','2007-01-08', '2007-01-09','2007-01-10','2007-01-11','2007-01-12', '2007-01-13','2007-01-14','2007-01-15','2007-01-16', '2007-01-17','2007-01-18','2007-01-19','2007-01-20', '2007-01-21','2007-01-22','2007-01-23','2007-01-24', '2007-01-25','2007-01-26','2007-01-27','2007-01-28', '2007-01-29','2007-01-30','2007-01-31') GROUP BY concat(revert_domain,url_prefix) ORDER BY uniq desc, cnt desc LIMIT 10\G
[6 Feb 2007 8:24]
Sergey Petrunya
domain_id_2: rows 933660 cost 1.12039e+06 domain_id: rows 1931598 cost 2.31792e+06 domain_id_3: rows 2599902 cost 3.11991e+06 Range analyzer result is: range plan for key domain_id_2, cost 1.12039e+06, records 933660 quick range select, key domain_id_2, length: 4 434876 <= X <= 434876 best_access_path(): key=1: (domain_id_2): ReuseRangeEstimateForRef hits, rows=933660. key=3: domain_id : ReuseRangeEstimateForRef hits, rows=1931598. key=5L domain_id_3: quick_key_parts=2, nranges=31 ref's key_parts=1 ReuseRangeEstimateForRef does not hit, And here we use rec-per-key statistics of #rows=302992 The problem is: We shouldn't even consider ref(const) when it is possible to use range access over more key parts.
[6 Feb 2007 8:35]
Sergey Petrunya
Another problem: consider this EXPLAIN:
[6 Feb 2007 8:35]
Sergey Petrunya
explain SELECT url_to, count(*) cnt, revert_domain, url_prefix, count(distinct from_site_id) uniq FROM link_in85 IGNORE INDEX (domain_message, domain_id_2) WHERE domain_id=434876 AND kind='link' AND isexternal=1 AND message_day IN ('2007-01-01','2007-01-02','2007-01-03','2007-01-04', '2007-01-05','2007-01-06','2007-01-07','2007-01-08', '2007-01-09','2007-01-10','2007-01-11','2007-01-12', '2007-01-13','2007-01-14','2007-01-15','2007-01-16', '2007-01-17','2007-01-18','2007-01-19','2007-01-20', '2007-01-21','2007-01-22','2007-01-23','2007-01-24', '2007-01-25','2007-01-26','2007-01-27','2007-01-28', '2007-01-29','2007-01-30','2007-01-31') GROUP BY concat(revert_domain,url_prefix) ORDER BY uniq desc, cnt desc LIMIT 10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: link_in85 type: ref possible_keys: domain_id,domain_id_3 key: domain_id_3 key_len: 4 ref: const rows: 302992 Extra: Using where; Using temporary; Using filesort 1 row in set (6 min 5.37 sec)
[6 Feb 2007 8:52]
Sergey Petrunya
Here we have this scenario: Range analyzer: find those two candidate accesses: key domain_id: found_read_time: 2.31792e+06 (cur. read_time: 1.18537e+06) rows= 1931598 key domain_id_3: found_read_time: 3.11991e+06 (cur. read_time: 1.18537e+06) rows= 2599902 Both of them are considered too expensive. So we destroy all quick selects. Best_access_path: ReuseRangeEstimateForRef doesn't fire for key domain_id_3, we use the rec_per_key (index statistics) estimate, and decide that using the index is better than not. ^ Here we face the same problem as described in previous case ^ Since quick select already has been destroyed, we can't use it and use ref access over a broader range. ^ Here is another possible issue: are there any cases where we'll need to re-create the quick select we've previously destroyed? It seems that once we have full ReuseRangeEstimateForRef + "use range if it scans a narrower set" we'll never need to re-create a quick select but this needs to be checked.
[16 Mar 2007 14:00]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/22131 ChangeSet@1.2466, 2007-03-16 15:00:50+01:00, mhansson@linux-st28.site +3 -0 bug #26106: Wrong plan may be chosen when there are several possible range and ref accesses Credits go to Sergey Petrunia for this description of the bug. For ref(const) access methods there are two kinds of statistics when the WHERE has an AND-part of "t.keyXpart1 = c1 AND ... t.keyXpartN = cN", N < key size. 1. Table statistics, i.e. index cardinality values 2. Range access estimate those estimates are not necessarily consistent with each other for example we can get E(#rows(t.key1part1 = c1)) < E( #rows( t.key1part1 = c1 AND t.key1part2 IN (1, 2) ) ) or E(#rows(t.key1part1 = c1)) < E( #rows( t.key1part1 = c1 AND t.key1part2 = othertbl.field ) ) which is obviously not true. The source of discrepancy is that index cardinality values can be very inaccurate for skewed datasets. The code marked as ReuseRangeEstimateForRef-[1-4] tries to do a smart job of using the best available estimate. e.g. if we have candidate accesses: * B = ref-access( t.key1part1 = c1 AND t.key1part2 = othertbl.field ) * A = range-access( t.key1part1 = c1 ) and we have the E( #rows(A) ) < E( #rows(A) ) as above, we'll pick the #rows estimate from range access and use it with ref access. (we will use ref access as it uses more keyparts, that's a heuristic) Ok, when we have range and ref over the same index we have these rules: * When access method X scans a subset of rows that access method Y will scan, use X, no matter what the cost numbers say. * Look if the obtained cost estimates are consistent. If there is an inconsistency, eliminate it by using range access estimates instead of ref. The bug is that the above rules weren't followed. best_access_path always chose the index with the lowest number of records, even if the figure wasn't reliable.
[20 Mar 2007 7:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/22323 ChangeSet@1.2466, 2007-03-20 08:24:17+01:00, mhansson@linux-st28.site +3 -0 bug #26106: Wrong plan may be chosen when there are several possible range and ref accesses Credits go to Sergey Petrunia for this description of the bug. For ref(const) access methods there are two kinds of statistics when the WHERE has an AND-part of "t.keyXpart1 = c1 AND ... t.keyXpartN = cN", N < key size. 1. Table statistics, i.e. index cardinality values 2. Range access estimate those estimates are not necessarily consistent with each other for example we can get E(#rows(t.key1part1 = c1)) < E( #rows( t.key1part1 = c1 AND t.key1part2 IN (1, 2) ) ) or E(#rows(t.key1part1 = c1)) < E( #rows( t.key1part1 = c1 AND t.key1part2 = othertbl.field ) ) which is obviously not true. The source of discrepancy is that index cardinality values can be very inaccurate for skewed datasets. The code marked as ReuseRangeEstimateForRef-[1-4] tries to do a smart job of using the best available estimate. e.g. if we have candidate accesses: * B = ref-access( t.key1part1 = c1 AND t.key1part2 = othertbl.field ) * A = range-access( t.key1part1 = c1 ) and we have the E( #rows(A) ) < E( #rows(A) ) as above, we'll pick the #rows estimate from range access and use it with ref access. (we will use ref access as it uses more keyparts, that's a heuristic) When we have range and ref over the same index we have these rules: * When access method X scans a subset of rows that access method Y will scan, use X, no matter what the cost numbers say. * Look if the obtained cost estimates are consistent. If there is an inconsistency, eliminate it by using range access estimates instead of ref. The bug is that the above rules weren't followed. best_access_path always chose the index with the lowest number of records, even if the figure wasn't reliable.
[22 Mar 2007 12:25]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/22629 ChangeSet@1.2466, 2007-03-22 13:26:42+01:00, mhansson@linux-st28.site +3 -0 bug #26106: Wrong plan may be chosen when there are several possible range and ref accesses Credits go to Sergey Petrunia for this description of the bug. For ref(const) access methods there are two kinds of statistics when the WHERE has an AND-part of "t.keyXpart1 = c1 AND ... t.keyXpartN = cN", N < key size. 1. Table statistics, i.e. index cardinality values 2. Range access estimate those estimates are not necessarily consistent with each other for example we can get E(#rows(t.key1part1 = c1)) < E( #rows( t.key1part1 = c1 AND t.key1part2 IN (1, 2) ) ) or E(#rows(t.key1part1 = c1)) < E( #rows( t.key1part1 = c1 AND t.key1part2 = othertbl.field ) ) which is obviously not true. The source of discrepancy is that index cardinality values can be very inaccurate for skewed datasets. The code marked as ReuseRangeEstimateForRef-[1-4] tries to do a smart job of using the best available estimate. e.g. if we have candidate accesses: * B = ref-access( t.key1part1 = c1 AND t.key1part2 = othertbl.field ) * A = range-access( t.key1part1 = c1 ) and we have the E( #rows(A) ) < E( #rows(B) ) as above, we'll pick the #rows estimate from range access and use it with ref access. (we will use ref access as it uses more keyparts, that's a heuristic) When we have range and ref over the same index we have these rules: * When access method X scans a subset of rows that access method Y will scan, use X, no matter what the cost numbers say. * Look if the obtained cost estimates are consistent. If there is an inconsistency, eliminate it by using range access estimates instead of ref. The bug is that the above rules weren't followed. best_access_path always chose the index with the lowest number of records, even if the figure wasn't reliable.
[22 Mar 2007 12:47]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/22633 ChangeSet@1.2466, 2007-03-22 13:48:41+01:00, mhansson@linux-st28.site +3 -0 bug #26106: Wrong plan may be chosen when there are several possible range and ref accesses Credits go to Sergey Petrunia for this description of the bug. For ref(const) access methods there are two kinds of statistics when the WHERE has an AND-part of "t.keyXpart1 = c1 AND ... t.keyXpartN = cN", N < key size. 1. Table statistics, i.e. index cardinality values 2. Range access estimate those estimates are not necessarily consistent with each other for example we can get E(#rows(t.key1part1 = c1)) < E( #rows( t.key1part1 = c1 AND t.key1part2 IN (1, 2) ) ) or E(#rows(t.key1part1 = c1)) < E( #rows( t.key1part1 = c1 AND t.key1part2 = othertbl.field ) ) which is obviously not true. The source of discrepancy is that index cardinality values can be very inaccurate for skewed datasets. The code marked as ReuseRangeEstimateForRef-[1-4] tries to do a smart job of using the best available estimate. e.g. if we have candidate accesses: * B = ref-access( t.key1part1 = c1 AND t.key1part2 = othertbl.field ) * A = range-access( t.key1part1 = c1 ) and we have the E( #rows(A) ) < E( #rows(B) ) as above, we'll pick the #rows estimate from range access and use it with ref access. (we will use ref access as it uses more keyparts, that's a heuristic) When we have range and ref over the same index we have these rules: * When access method X scans a subset of rows that access method Y will scan, use X, no matter what the cost numbers say. * Look if the obtained cost estimates are consistent. If there is an inconsistency, eliminate it by using range access estimates instead of ref. The bug is that the above rules weren't followed. best_access_path always chose the index with the lowest number of records, even if the figure wasn't reliable.
[30 May 2007 22:13]
Sergey Petrunya
Another possible problem (maybe this needs to be separated into another bug): Indexes domain_id_3 and domain_message are defined over the same sets of colums: domain_id_3(`domain_id`,`message_day`,`revert_domain`,`url_prefix`,`from_site_id`,`isexternal`) domain_message(`domain_id`,`message_day`,`isexternal`,`revert_domain`,`url_prefix`,`from_site_id`) The query explain SELECT * FROM link_in85 WHERE domain_id=434876 AND isexternal=1 AND message_day IN ('2007-01-01','2007-01-02','2007-01-03','2007-01-04', '2007-01-05','2007-01-06','2007-01-07','2007-01-08', '2007-01-09','2007-01-10','2007-01-11','2007-01-12', '2007-01-13','2007-01-14','2007-01-15','2007-01-16', '2007-01-17','2007-01-18','2007-01-19','2007-01-20', '2007-01-21','2007-01-22','2007-01-23','2007-01-24', '2007-01-25','2007-01-26','2007-01-27','2007-01-28', '2007-01-29','2007-01-30','2007-01-31') uses 2 key parts for domain_id_3, and 3 key parts for domain_message. The #records estimate for range is the same for both indexes and equal to 31. (Perhaps the cause of this is that records_in_range() may not return 0, as that signifies an empty range) The default read_cost() functions produces equal costs, and domain_id_3 is chosen over domain_message. Actually, the ranges over domain_id_3 contain 30 rows, while ranges over domain_message contain 0 rows, so domain_message is better. MySQL could analyze situations like this and chose index for which more predicates/keyparts are used.
[30 May 2007 22:13]
Sergey Petrunya
Investigation log ================= table data ---------- select count(*) from link_in85 where domain_id=434876 AND message_day IN ('2007-01-01','2007-01-02','2007-01-03','2007-01-04', '2007-01-05','2007-01-06','2007-01-07','2007-01-08', '2007-01-09','2007-01-10','2007-01-11','2007-01-12', '2007-01-13','2007-01-14','2007-01-15','2007-01-16', '2007-01-17','2007-01-18','2007-01-19','2007-01-20', '2007-01-21','2007-01-22','2007-01-23','2007-01-24', '2007-01-25','2007-01-26','2007-01-27','2007-01-28', '2007-01-29','2007-01-30','2007-01-31') ; +----------+ | count(*) | +----------+ | 31 | +----------+ 1 row in set (0.06 sec) select count(*) from link_in85 where domain_id=434876 AND message_day IN ('2007-01-01','2007-01-02','2007-01-03','2007-01-04', '2007-01-05','2007-01-06','2007-01-07','2007-01-08', '2007-01-09','2007-01-10','2007-01-11','2007-01-12', '2007-01-13','2007-01-14','2007-01-15','2007-01-16', '2007-01-17','2007-01-18','2007-01-19','2007-01-20', '2007-01-21','2007-01-22','2007-01-23','2007-01-24', '2007-01-25','2007-01-26','2007-01-27','2007-01-28', '2007-01-29','2007-01-30','2007-01-31') and isexternal=1; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.01 sec) Optimizer runs -------------- explain select * from link_in85 force index(domain_id_3) where domain_id=434876 AND isexternal=1 message_day IN ('2007-01-01','2007-01-02','2007-01-03','2007-01-04', '2007-01-05','2007-01-06','2007-01-07','2007-01-08', '2007-01-09','2007-01-10','2007-01-11','2007-01-12', '2007-01-13','2007-01-14','2007-01-15','2007-01-16', '2007-01-17','2007-01-18','2007-01-19','2007-01-20', '2007-01-21','2007-01-22','2007-01-23','2007-01-24', '2007-01-25','2007-01-26','2007-01-27','2007-01-28', '2007-01-29','2007-01-30','2007-01-31') ; +----+-------------+-----------+-------+---------------+-------------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+---------------+-------------+---------+------+------+-------------+ | 1 | SIMPLE | link_in85 | range | domain_id_3 | domain_id_3 | 7 | NULL | 31 | Using where | +----+-------------+-----------+-------+---------------+-------------+---------+------+------+-------------+ In the .trace: T@3 : | | | | | | | | | | info: Returning range plan for key domain_id_3, cost 68.21, records 31 T@3 : | | | | | | | | | <get_key_scans_params ... quick range select, key domain_id_3, length: 7 434876/2007-01-01 <= X <= 434876/2007-01-01 434876/2007-01-02 <= X <= 434876/2007-01-02 434876/2007-01-03 <= X <= 434876/2007-01-03 434876/2007-01-04 <= X <= 434876/2007-01-04 434876/2007-01-05 <= X <= 434876/2007-01-05 434876/2007-01-06 <= X <= 434876/2007-01-06 434876/2007-01-07 <= X <= 434876/2007-01-07 434876/2007-01-08 <= X <= 434876/2007-01-08 434876/2007-01-09 <= X <= 434876/2007-01-09 434876/2007-01-10 <= X <= 434876/2007-01-10 434876/2007-01-11 <= X <= 434876/2007-01-11 434876/2007-01-12 <= X <= 434876/2007-01-12 434876/2007-01-13 <= X <= 434876/2007-01-13 434876/2007-01-14 <= X <= 434876/2007-01-14 434876/2007-01-15 <= X <= 434876/2007-01-15 434876/2007-01-16 <= X <= 434876/2007-01-16 434876/2007-01-17 <= X <= 434876/2007-01-17 434876/2007-01-18 <= X <= 434876/2007-01-18 434876/2007-01-19 <= X <= 434876/2007-01-19 434876/2007-01-20 <= X <= 434876/2007-01-20 434876/2007-01-21 <= X <= 434876/2007-01-21 434876/2007-01-22 <= X <= 434876/2007-01-22 434876/2007-01-23 <= X <= 434876/2007-01-23 434876/2007-01-24 <= X <= 434876/2007-01-24 434876/2007-01-25 <= X <= 434876/2007-01-25 434876/2007-01-26 <= X <= 434876/2007-01-26 434876/2007-01-27 <= X <= 434876/2007-01-27 434876/2007-01-28 <= X <= 434876/2007-01-28 434876/2007-01-29 <= X <= 434876/2007-01-29 434876/2007-01-30 <= X <= 434876/2007-01-30 === Now for domain_message: === explain select * from link_in85 force index(domain_message) where domain_id=434876 AND isexternal=1 message_day IN ('2007-01-01','2007-01-02','2007-01-03','2007-01-04', '2007-01-05','2007-01-06','2007-01-07','2007-01-08', '2007-01-09','2007-01-10','2007-01-11','2007-01-12', '2007-01-13','2007-01-14','2007-01-15','2007-01-16', '2007-01-17','2007-01-18','2007-01-19','2007-01-20', '2007-01-21','2007-01-22','2007-01-23','2007-01-24', '2007-01-25','2007-01-26','2007-01-27','2007-01-28', '2007-01-29','2007-01-30','2007-01-31') ; +----+-------------+-----------+-------+----------------+----------------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+----------------+----------------+---------+------+------+-------------+ | 1 | SIMPLE | link_in85 | range | domain_message | domain_message | 8 | NULL | 31 | Using where | +----+-------------+-----------+-------+----------------+----------------+---------+------+------+-------------+ In the .trace: T@3 : | | | | | | | | | | info: Returning range plan for key domain_message, cost 68.21, records 31 T@3 : | | | | | | | | | <get_key_scans_params quick range select, key domain_message, length: 8 434876/2007-01-01/1 <= X <= 434876/2007-01-01/1 434876/2007-01-02/1 <= X <= 434876/2007-01-02/1 434876/2007-01-03/1 <= X <= 434876/2007-01-03/1 434876/2007-01-04/1 <= X <= 434876/2007-01-04/1 434876/2007-01-05/1 <= X <= 434876/2007-01-05/1 434876/2007-01-06/1 <= X <= 434876/2007-01-06/1 434876/2007-01-07/1 <= X <= 434876/2007-01-07/1 434876/2007-01-08/1 <= X <= 434876/2007-01-08/1 434876/2007-01-09/1 <= X <= 434876/2007-01-09/1 434876/2007-01-10/1 <= X <= 434876/2007-01-10/1 434876/2007-01-11/1 <= X <= 434876/2007-01-11/1 434876/2007-01-12/1 <= X <= 434876/2007-01-12/1 434876/2007-01-13/1 <= X <= 434876/2007-01-13/1 434876/2007-01-14/1 <= X <= 434876/2007-01-14/1 434876/2007-01-15/1 <= X <= 434876/2007-01-15/1 434876/2007-01-16/1 <= X <= 434876/2007-01-16/1 434876/2007-01-17/1 <= X <= 434876/2007-01-17/1 434876/2007-01-18/1 <= X <= 434876/2007-01-18/1 434876/2007-01-19/1 <= X <= 434876/2007-01-19/1 434876/2007-01-20/1 <= X <= 434876/2007-01-20/1 434876/2007-01-21/1 <= X <= 434876/2007-01-21/1 434876/2007-01-22/1 <= X <= 434876/2007-01-22/1 434876/2007-01-23/1 <= X <= 434876/2007-01-23/1 434876/2007-01-24/1 <= X <= 434876/2007-01-24/1 434876/2007-01-25/1 <= X <= 434876/2007-01-25/1 434876/2007-01-26/1 <= X <= 434876/2007-01-26/1 434876/2007-01-27/1 <= X <= 434876/2007-01-27/1 434876/2007-01-28/1 <= X <= 434876/2007-01-28/1 434876/2007-01-29/1 <= X <= 434876/2007-01-29/1 434876/2007-01-30/1 <= X <= 434876/2007-01-30/1 434876/2007-01-31/1 <= X <= 434876/2007-01-31/1 other_keys: 0x0:
[5 Jun 2007 12:32]
Martin Hansson
Hi Sergey, this is interesting. Can you please post the dataset you were using for these last runs? I'll look into this and see if you can fix both these problems at the same time.
[5 Jun 2007 15:04]
Sergey Petrunya
Mutually-inconsistent range problem aside, the patch does not solve the problem that ref(idx_X) can be used where ref(idx_X) is also applicable and it would use more key parts (i.e. scan fewer records). Using the patched tree, the supplied dataset: mysql> explain SELECT -> url_to, -> count(*) cnt, -> revert_domain, -> url_prefix, -> count(distinct from_site_id) uniq -> FROM -> link_in85 IGNORE INDEX (domain_message, domain_id_2) -> WHERE -> domain_id=434876 AND -> kind='link' AND -> isexternal=1 AND -> message_day IN ('2007-01-01','2007-01-02','2007-01-03','2007-01-04', -> '2007-01-05','2007-01-06','2007-01-07','2007-01-08', -> '2007-01-09','2007-01-10','2007-01-11','2007-01-12', -> '2007-01-13','2007-01-14','2007-01-15','2007-01-16', -> '2007-01-17','2007-01-18','2007-01-19','2007-01-20', -> '2007-01-21','2007-01-22','2007-01-23','2007-01-24', -> '2007-01-25','2007-01-26','2007-01-27','2007-01-28', -> '2007-01-29','2007-01-30','2007-01-31') -> GROUP BY -> concat(revert_domain,url_prefix) -> ORDER BY -> uniq desc, cnt desc -> LIMIT 10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: link_in85 type: ref possible_keys: domain_id,domain_id_3 key: domain_id_3 key_len: 4 ref: const rows: 2599902 Extra: Using where; Using temporary; Using filesort Index domain_id_3 is defined as (`domain_id`,`message_day`,`isexternal`,`revert_domain`,`url_prefix`,`from_site_id`) The QEP uses ref(domain_id_3) over 1 key part (condition is domain_id=434876), while range(domain_id_3) would use 3 key parts.
[15 Jun 2007 11:54]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28861 ChangeSet@1.2466, 2007-06-15 13:55:18+03:00, mhansson@linux-st28.site +3 -0 bug #26106: Wrong plan may be chosen when there are several possible range and ref accesses For ref(const) access methods there are two kinds of statistics when a prefix of a key is matched: range access estimate and table statistics, which are not necessarily consistent with each other. This may make the optimizer choose a key which matches fewer key parts than some other key, which is obviously not correct. Fixed by giving preferrence to the keys that match the most parts, and choosing the best one among these.
[30 Sep 2009 7:49]
Martin Hansson
Requesting re-triage, the latest action was almost a year ago.
[9 Mar 2010 17:37]
Ruslan Zakirov
Knew about the bug for a while, but didn't know that there are patches exist. Sad that this is flagged as minor issue as it's easy to reproduce. Here is simple way: mysql> select version(); | 5.0.77-log | mysql> create table b26106(type varchar(64), content varchar(64)); mysql> insert into b26106 values('ticket', 'asd'); mysql> insert into b26106 values('ticket', 'asd'); mysql> insert into b26106 values('ticket', 'zczx'); mysql> insert into b26106 values('ticket', 'sdfgs'); mysql> insert into b26106 values('ticket', 'qwer'); mysql> insert into b26106 values('ticket', 'ldgjh'); mysql> create index content on b26106(content); Query OK, 6 rows affected (0.08 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> select * from b26106 where content >= 'd' and content <= 'z'; +--------+---------+ | type | content | +--------+---------+ | ticket | ldgjh | | ticket | qwer | | ticket | sdfgs | +--------+---------+ 3 rows in set (0.00 sec) mysql> EXPLAIN select * from b26106 where content >= 'd' and content <= 'z'; +----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | b26106 | range | content | content | 67 | NULL | 2 | Using where | +----+-------------+--------+-------+---------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec) mysql> create index type_content on b26106(type, content); Query OK, 6 rows affected (0.01 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> EXPLAIN select * from b26106 where type = 'ticket' AND content >= 'd' and content <= 'z'; +----+-------------+--------+------+----------------------+--------------+---------+-------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+----------------------+--------------+---------+-------+------+--------------------------+ | 1 | SIMPLE | b26106 | ref | content,type_content | type_content | 67 | const | 1 | Using where; Using index | +----+-------------+--------+------+----------------------+--------------+---------+-------+------+--------------------------+ 1 row in set (0.00 sec) So there is no need in very special test case for this bug to manifest itself.
[24 Jun 2010 14:39]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/112106 3270 Martin Hansson 2010-06-24 Bug#26106: Wrong plan may be chosen when there are several possible range and ref accesses For ref(const) access methods there are two kinds of statistics when a prefix of a key is matched: range access estimate and table statistics, which are not necessarily consistent with each other. This may make the optimizer choose a key which matches fewer key parts than some other key, which is obviously not correct. Fixed by giving preferrence to the keys that match the most parts, and choosing the best one among these.
[23 Jul 2010 12:36]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100723121929-90e9zemk3jkr2ocy) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (pib:18)
[4 Aug 2010 8:11]
Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (merge vers: 5.5.5-m3) (pib:18)
[4 Aug 2010 8:26]
Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804081533-c1d3rbipo9e8rt1s) (version source revid:vasil.dimov@oracle.com-20100531152341-x2d4hma644icamh1) (merge vers: 5.5.5-m3) (pib:18)
[13 Aug 2010 2:31]
Paul DuBois
Noted in 5.6.0 changelog. A suboptimal query execution plan could chosen when there were several possible range and ref accesses. Now preference is given to the keys that match the most parts and choosing the best one among them.