| Bug #38550 | multiple optimizer bugs for queries with references from preceding table | ||
|---|---|---|---|
| Submitted: | 4 Aug 2008 23:47 | Modified: | 16 Oct 2008 5:53 |
| Reporter: | Ruslan Zakirov | ||
| Status: | Verified | ||
| Category: | Server: Optimizer | Severity: | S5 (Performance) |
| Version: | 5.0.60, 5.1 | OS: | Any |
| Assigned to: | Bugs System | Target Version: | |
| Tags: | multiple-columns index, INDEX, ref, range, preceding table | ||
| Triage: | D4 (Minor) | ||
[4 Aug 2008 23:47]
Ruslan Zakirov
[4 Aug 2008 23:49]
Ruslan Zakirov
perl scripts to create, test and banch
Attachment: range_vs_ref_scan_with_preceding_table.tar.gz (application/gzip, text), 1.42 KiB.
[5 Aug 2008 10:08]
Susanne Ebrecht
CREATE TABLE acl(id INT PRIMARY KEY, type varchar(32)); CREATE TABLE groups(id int(11) NOT NULL, domain varchar(32) default NULL, type varchar(32) default NULL, PRIMARY KEY(id)) ENGINE=MyISAM; INSERT INTO acl VALUES(1, 'owner'), (2, 'admincc'); DELIMITER // create procedure fill_data() begin declare i integer default 1; declare j integer default 2; declare k integer default 3; declare l integer default 4; declare t integer default 3; insert into groups values(i,'system','owner'),(j,'system','requester'),(k,'system','admincc'),(l,'system','cc'); set i= i+4; set j= j+4; set k= k+4; set l= l+4; while t > 0 do insert into groups values(i,'queue','owner'),(j,'queue','requester'),(k,'queue','admincc'),(l,'queue','cc'); set i= i+4; set j= j+4; set k= k+4; set l= l+4; set t= t-1; end while; set t:= 1000; while t > 0 do insert into groups values(i,'ticket','owner'),(j,'ticket','requester'),(k,'ticket','admincc'),(l,'ticket','cc'); set i= i+4; set j= j+4; set k= k+4; set l= l+4; set t= t-1; end while; end// DELIMITER ; call fill_data(); CREATE INDEX groups_d ON groups(domain); CREATE INDEX groups_t ON groups(type); CREATE INDEX groups_dt ON groups(domain, type); CREATE INDEX groups_td ON groups(type, domain); EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_t) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX (groups_d) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_dt) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_td) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G truncate acl; truncate groups; drop index groups_t on groups; drop index groups_d on groups; drop index groups_dt on groups; drop index groups_td on groups; alter table acl engine='innodb'; alter table groups engine='innodb'; call fill_data(); CREATE INDEX groups_d ON groups(domain); CREATE INDEX groups_t ON groups(type); CREATE INDEX groups_dt ON groups(domain, type); CREATE INDEX groups_td ON groups(type, domain); EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_t) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX (groups_d) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_dt) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_td) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G
[5 Aug 2008 10:10]
Susanne Ebrecht
I can reproduce this with my tests on MySQL bzr tree for 5.0 and 5.1
[11 Sep 2008 18:13]
Sergey Petrunya
> Here we have first problems:
> 1) key_len is still 35, so mysql will use only first part of the index when I think it
can
> use both conditions to do better range scan
No, at the moment range scan may not depend on another table (with exception
of references to constant tables which are constidered constants). It seems
you're asking about combining ref and range access together. We have this on
somewhere our todo but I can't tell when this will become available.
In MySQL-6.0 we have IndexConditionPushdown optimization which is expected
to rectify the situation a bit. With IndexConditionPushdown, the number of
index entries will be still the same but full table records will be
retrieved only if the whole condition is satisfied.
> 2) Even if 1) is not possible then extra should have 'using index' as the index can be
> used to filter records. Benchmarks below show that InnoDB benefits from it a little.
It can't have "Using index" because g.id is not covered by the index. InnoDB
implicitly includes all primary key columns at the end of every index, MyISAM
doesn't do that.
> 3) Slight shift in row estimations, worth investigation, but it's up to you to decide
if
> it's bug or not.
It's not a bug, the number is an estimate from the range optimizer estimate and some
variety
in estimation is expected.
Desired output: range type with 4 rows and key_len is 70.
EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id
FROM acl a, groups g USE INDEX (groups_td)
WHERE (g.domain = 'queue' OR g.domain = 'system') AND g.type = a.type
$VAR1 = {
'type' => 'ref'
'rows' => '1004',
'key' => 'groups_td',
'key_len' => '35',
'Extra' => 'Using where',
};
> This is totally wrong. It should be range access using both columns, most probably
mysql
> estimate possibility of range access by rows estimations are wrong (close to 1004 or
even
> bigger) so ref access wins.
See the above note about range access. Only ref access can be used for this query. Its
estimates are based on index cardinalities. Do you have a highly-skewed data
distribution?
Ok let's consider that ref access is correct in this case then mysql should use index's
data to filter records after scan at least, see tests for innodb below.
> Let's switch to InnoDB.
EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id
FROM acl a, groups g USE INDEX (groups_t)
WHERE (g.domain = 'queue' OR g.domain = 'system') AND g.type = a.type
$VAR1 = {
'type' => 'ref'
'rows' => '2085',
'key' => 'groups_t',
'key_len' => '35',
'Extra' => 'Using where',
};
> Rows estimation is twice bigger than it should be, not an order, but anyway.
It's a known property of InnoDB. It shouldn't reach order-of-magnitude differences.
[11 Sep 2008 22:46]
Sergey Petrunya
The issues raised in this bug are valid, but none of them is a "bug" in a sense that they do not describe something which was expected to be implemented in released versions but wasn't. Changing status to ToBeFixedLater. Ruslan, thanks for the input.
