Bug #38049 | incorrect rows estimations with references from preceding table | ||
---|---|---|---|
Submitted: | 11 Jul 2008 14:50 | Modified: | 16 Nov 2010 3:52 |
Reporter: | Ruslan Zakirov | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.0.60, 5.0.68-bzr, 5.1.28-bzr | OS: | Any |
Assigned to: | Guilhem Bichot | CPU Architecture: | Any |
Tags: | INDEX, multiple-column index, preceding table, ref, rows |
[11 Jul 2008 14:50]
Ruslan Zakirov
[11 Jul 2008 14:52]
Ruslan Zakirov
scripts to reproduce the problem
Attachment: ref_scan_with_preceding_table.tar.gz (application/gzip, text), 1.11 KiB.
[14 Jul 2008 13:37]
Susanne Ebrecht
Many thanks for reporting a bug. I did your test but I can't see what exactly you meant. Where do you think is a bug?
[14 Jul 2008 21:22]
Ruslan Zakirov
Hi Oops, found that create_data.pl creates wrong index. Anyway results are pretty close, I'll attach new create_data.pl. Sorry, may be my description was cryptic. Here is partial output from test.pl: EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_d) WHERE g.domain = 'queue' AND g.type = a.type $VAR1 = { 'Extra' => 'Using where', 'key_len' => '35', 'ref' => 'const', 'table' => 'g', 'rows' => '11', 'key' => 'groups_d', 'select_type' => 'SIMPLE', 'possible_keys' => 'groups_d', 'id' => '1', 'type' => 'ref' }; EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_dt) WHERE g.domain = 'queue' AND g.type = a.type $VAR1 = { 'Extra' => 'Using where', 'key_len' => '70', 'ref' => 'const,test.a.type', 'table' => 'g', 'rows' => '335', 'key' => 'groups_dt', 'select_type' => 'SIMPLE', 'possible_keys' => 'groups_dt', 'id' => '1', 'type' => 'ref' }; These are second rows from EXPLAINs when mysql forced to use different indexes for the same query. Condition "g.domain = 'queue' AND g.type = a.type" couldn't return more rows than "g.domain = 'queue'" when mysql's rows estimator says different, so server chooses shorter index. That is major problem my bug report is about. However, the following results are interesting for comparison as well: EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_dt) WHERE g.domain = 'queue' AND g.type = a.type $VAR1 = { 'Extra' => 'Using where', 'key_len' => '70', 'ref' => 'const,test.a.type', 'table' => 'g', 'rows' => '335', 'key' => 'groups_dt', 'select_type' => 'SIMPLE', 'possible_keys' => 'groups_dt', 'id' => '1', 'type' => 'ref' }; EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_td) WHERE g.domain = 'queue' AND g.type = a.type $VAR1 = { 'Extra' => 'Using where', 'key_len' => '70', 'ref' => 'test.a.type,const', 'table' => 'g', 'rows' => '335', 'key' => 'groups_td', 'select_type' => 'SIMPLE', 'possible_keys' => 'groups_td', 'id' => '1', 'type' => 'ref' }; This is cool that estimated rows for both are equal, but as I said above values are bigger than 11 (estimations for index "groups_d") what is wrong.
[14 Jul 2008 21:23]
Ruslan Zakirov
updated create_data.pl
Attachment: create_data.pl (text/x-perl), 1.26 KiB.
[4 Aug 2008 12:11]
Valeriy Kravchuk
Verified just as described with 5.0.68: openxs@suse:/home2/openxs/dbs/5.0> bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 Server version: 5.0.68-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> show create table groups\G *************************** 1. row *************************** Table: groups Create Table: CREATE TABLE `groups` ( `id` int(11) NOT NULL, `domain` varchar(32) default NULL, `type` varchar(32) default NULL, PRIMARY KEY (`id`), KEY `groups_d` (`domain`), KEY `groups_t` (`type`), KEY `groups_dt` (`domain`,`type`), KEY `groups_td` (`type`,`domain`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> analyze table groups; +-------------+---------+----------+-----------------------------+ | Table | Op | Msg_type | Msg_text | +-------------+---------+----------+-----------------------------+ | test.groups | analyze | status | Table is already up to date | +-------------+---------+----------+-----------------------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id -> FROM acl a, groups g USE INDEX(groups_dt) -> WHERE g.domain = 'queue' AND g.type = a.type\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2 Extra: Using temporary *************************** 2. row *************************** id: 1 select_type: SIMPLE table: g type: ref possible_keys: groups_dt key: groups_dt key_len: 70 ref: const,test.a.type rows: 335 Extra: Using where 2 rows in set (0.03 sec) mysql> EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_td) WHERE g.domain = 'queue' AND g.type = a.type\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2 Extra: Using temporary *************************** 2. row *************************** id: 1 select_type: SIMPLE table: g type: ref possible_keys: groups_td key: groups_td key_len: 70 ref: test.a.type,const rows: 335 Extra: Using where 2 rows in set (0.00 sec) Note the number of rows with domain='queue' though: mysql> select count(*) from groups where domain='queue'; +----------+ | count(*) | +----------+ | 12 | +----------+ 1 row in set (0.01 sec) And note this estimation: mysql> explain select count(*) from groups where domain='queue'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: groups type: ref possible_keys: groups_d,groups_dt key: groups_d key_len: 35 ref: const rows: 11 Extra: Using where; Using index 1 row in set (0.00 sec) and this one, when same groups_dt index is used: mysql> explain select count(*) from groups force index(groups_dt) where domain='queue'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: groups type: ref possible_keys: groups_dt key: groups_dt key_len: 35 ref: const rows: 13 Extra: Using where; Using index 1 row in set (0.00 sec)
[9 Sep 2008 19:22]
Sergey Petrunya
The posted queries are a manifestation of two problems: Problem1: ReuseRangeEstimateForRef bug -------------------------------- mysql> explain SELECT STRAIGHT_JOIN g.id FROM acl a, groups g USE INDEX(groups_td) WHERE g.domain = 'queue' AND g.type = a.type; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: g type: ref possible_keys: groups_dt key: groups_dt key_len: 70 ref: const,j1.a.type rows: 335 Extra: Using where For this query: - Range optimizer obtains that records_in_range(groups_dt, g.domain = 'queue') = 13. - Index statistics tell that rec_per_key[g.domain = const1 AND g.type = const2] = 335, which apparently a contradiction - ReuseRangeEstimateForRef-2 code is supposed to handle this but it does not.
[11 Sep 2008 13:48]
Sergey Petrunya
Problem2: Picking ref access over a shorter index ------------------------------------------------- Demonstrated by this query: mysql> explain SELECT STRAIGHT_JOIN g.id FROM acl a, groups g WHERE g.domain = 'queue' AND g.type = a.type; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: a type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: g type: ref possible_keys: groups_d,groups_t,groups_dt,groups_td key: groups_d key_len: 35 ref: const rows: 11 Extra: Using where This uses index "groups_d" while there are more restrictive indexes groups_dt and groups_td. Using a longer index can be more expensive because we'll need to walk through more index pages if we need to scan the same number of index entries. On the other hand, additional condition may limit the number of index entires and table records that we will need to scan. One may argue that using more restrictions and a longer index almost always pays off.
[11 Sep 2008 13:50]
Sergey Petrunya
Problem3: ReuseRangeEstimateForRef doesn't do complete job ---------------------------------------------------------- Tracing the query gets you this: check_quick_select(): range access over g.domain = 'queue': groups_d: 11 rows groups_dt: 13 rows best_access_path(): index groups_d records = 11 (ReuseRangeEstmateForRef-1) index groups_t records = 1004 (from index stats) index groups_dt records = 13 (ReuseRangeEstmateForRef-2) index groups_td records = 335 (from index statistics) Here we see that 1. MyISAM returns different estimates for the same range for different indexes. ReuseRangeEstmateForRef code propagates the difference to ref access estimates. 2. We use a highly-incorrect estimate for index groups_td. This probably doesn't matter much for this particular case because we have an equally good index grooups_dt which is processed by ReuseRangeEstmateForRef
[11 Sep 2008 14:02]
Sergey Petrunya
During IRC discussion with Ruslan, noted the following approaches to resolve problems in 2-3: 1. Make ReuseRangeEstimateFoRef operate across indexes (i.e. record and estimates basing on column restrictions). One big question here will be whether we should consider restrictions on different columns independent or not. 2. Whenever ReuseRangeEstimateForRef re-uses an estimate for a prefix of the used ref access, assume that restrictions on uncovered key parts have some selectivity. The only numbers we could use to calculate the additional selectivity are index cardinality values. One might argue though, that ReuseRangeEstimateForRef makes difference in cases when index cardinality statistics is hugely skewed, so we should not use it? 2. Introduce a heuristics/cost adjustment to use longer indexes 3. Make the range optimizer to cross-check estimates it has got from different indexes for the same ranges and produce a consistent estimates. This seems to be quite complicated.
[12 Sep 2008 18:42]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/54002 2686 Sergey Petrunia 2008-09-12 BUG#38049: incorrect rows estimations with references from preceding table - Fix condition in ReuseRangeEstimateForRef-3. The intended meaning of the changed part is: "And ref access candidate has key=const restrictions for every keypart that was used by range access".
[12 Sep 2008 18:45]
Sergey Petrunya
This the fix that was developed during irc discussion. It only fixes Problem#1. The rest will be branched off to separate bug entries.
[26 Jan 2009 19:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/64080 2814 Sergey Petrunia 2009-01-26 BUG#38049: incorrect rows estimations with references from preceding table - Fix condition in ReuseRangeEstimateForRef-3. The intended meaning of the changed part is: "And ref access candidate has key=const restrictions for every keypart that was used by range access".
[2 Feb 2009 16:06]
Bugs System
Pushed into 6.0.10-alpha (revid:sergefp@mysql.com-20090202090240-dlkxhmc1asrar5rl) (version source revid:sergefp@mysql.com-20090126194259-ue20il3qro529l4d) (merge vers: 6.0.10-alpha) (pib:6)
[19 Oct 2009 20:22]
Guilhem Bichot
I ported the fix to the next-mr-bugfixing tree
[19 Oct 2009 20:35]
Guilhem Bichot
Somehow the system removed the fact that Sergey P did the fix, and that Igor reviewed it, and I cannot correct this. Looks like the bug was never documented. Fix is in 6.0 and queued to next-mr-bugfixing. Changelog: "For certain SELECTs using 'ref' access, MySQL estimated a wrong number of rows, this could lead to inefficient query plans".
[31 Oct 2009 8:20]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091031081410-qkxmjsdzjmj840aq) (version source revid:guilhem@mysql.com-20091019192733-v4nfabu8milak2ql) (merge vers: 6.0.14-alpha) (pib:13)
[2 Nov 2009 21:14]
Paul DuBois
Noted in 6.0.14 changelog. For certain SELECT statements using ref access, MySQL estimated an incorrect number of rows, which could lead to inefficient query plans. Setting report to NDI pending push to 5.5.x.
[12 Nov 2009 8:17]
Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:mikael@mysql.com-20091102100915-a2nbfxaqprpgptfw) (merge vers: 5.5.0-beta) (pib:13)
[12 Nov 2009 20:17]
Paul DuBois
Noted in 5.5.0 changelog.
[16 Aug 2010 6:39]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:04]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)