Bug #38550 | multiple optimizer bugs for queries with references from preceding table | ||
---|---|---|---|
Submitted: | 4 Aug 2008 21:47 | Modified: | 16 Oct 2008 3:53 |
Reporter: | Ruslan Zakirov | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.0.60, 5.1 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | INDEX, multiple-columns index, preceding table, range, ref |
[4 Aug 2008 21:47]
Ruslan Zakirov
[4 Aug 2008 21:49]
Ruslan Zakirov
perl scripts to create, test and banch
Attachment: range_vs_ref_scan_with_preceding_table.tar.gz (application/gzip, text), 1.42 KiB.
[5 Aug 2008 8:08]
Susanne Ebrecht
CREATE TABLE acl(id INT PRIMARY KEY, type varchar(32)); CREATE TABLE groups(id int(11) NOT NULL, domain varchar(32) default NULL, type varchar(32) default NULL, PRIMARY KEY(id)) ENGINE=MyISAM; INSERT INTO acl VALUES(1, 'owner'), (2, 'admincc'); DELIMITER // create procedure fill_data() begin declare i integer default 1; declare j integer default 2; declare k integer default 3; declare l integer default 4; declare t integer default 3; insert into groups values(i,'system','owner'),(j,'system','requester'),(k,'system','admincc'),(l,'system','cc'); set i= i+4; set j= j+4; set k= k+4; set l= l+4; while t > 0 do insert into groups values(i,'queue','owner'),(j,'queue','requester'),(k,'queue','admincc'),(l,'queue','cc'); set i= i+4; set j= j+4; set k= k+4; set l= l+4; set t= t-1; end while; set t:= 1000; while t > 0 do insert into groups values(i,'ticket','owner'),(j,'ticket','requester'),(k,'ticket','admincc'),(l,'ticket','cc'); set i= i+4; set j= j+4; set k= k+4; set l= l+4; set t= t-1; end while; end// DELIMITER ; call fill_data(); CREATE INDEX groups_d ON groups(domain); CREATE INDEX groups_t ON groups(type); CREATE INDEX groups_dt ON groups(domain, type); CREATE INDEX groups_td ON groups(type, domain); EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_t) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX (groups_d) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_dt) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_td) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G truncate acl; truncate groups; drop index groups_t on groups; drop index groups_d on groups; drop index groups_dt on groups; drop index groups_td on groups; alter table acl engine='innodb'; alter table groups engine='innodb'; call fill_data(); CREATE INDEX groups_d ON groups(domain); CREATE INDEX groups_t ON groups(type); CREATE INDEX groups_dt ON groups(domain, type); CREATE INDEX groups_td ON groups(type, domain); EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_t) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX (groups_d) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_dt) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX(groups_td) WHERE (g.domain='queue' OR g.domain='system') AND g.type=a.type\G
[5 Aug 2008 8:10]
Susanne Ebrecht
I can reproduce this with my tests on MySQL bzr tree for 5.0 and 5.1
[11 Sep 2008 16:13]
Sergey Petrunya
> Here we have first problems: > 1) key_len is still 35, so mysql will use only first part of the index when I think it can > use both conditions to do better range scan No, at the moment range scan may not depend on another table (with exception of references to constant tables which are constidered constants). It seems you're asking about combining ref and range access together. We have this on somewhere our todo but I can't tell when this will become available. In MySQL-6.0 we have IndexConditionPushdown optimization which is expected to rectify the situation a bit. With IndexConditionPushdown, the number of index entries will be still the same but full table records will be retrieved only if the whole condition is satisfied. > 2) Even if 1) is not possible then extra should have 'using index' as the index can be > used to filter records. Benchmarks below show that InnoDB benefits from it a little. It can't have "Using index" because g.id is not covered by the index. InnoDB implicitly includes all primary key columns at the end of every index, MyISAM doesn't do that. > 3) Slight shift in row estimations, worth investigation, but it's up to you to decide if > it's bug or not. It's not a bug, the number is an estimate from the range optimizer estimate and some variety in estimation is expected. Desired output: range type with 4 rows and key_len is 70. EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX (groups_td) WHERE (g.domain = 'queue' OR g.domain = 'system') AND g.type = a.type $VAR1 = { 'type' => 'ref' 'rows' => '1004', 'key' => 'groups_td', 'key_len' => '35', 'Extra' => 'Using where', }; > This is totally wrong. It should be range access using both columns, most probably mysql > estimate possibility of range access by rows estimations are wrong (close to 1004 or even > bigger) so ref access wins. See the above note about range access. Only ref access can be used for this query. Its estimates are based on index cardinalities. Do you have a highly-skewed data distribution? Ok let's consider that ref access is correct in this case then mysql should use index's data to filter records after scan at least, see tests for innodb below. > Let's switch to InnoDB. EXPLAIN SELECT STRAIGHT_JOIN DISTINCT g.id FROM acl a, groups g USE INDEX (groups_t) WHERE (g.domain = 'queue' OR g.domain = 'system') AND g.type = a.type $VAR1 = { 'type' => 'ref' 'rows' => '2085', 'key' => 'groups_t', 'key_len' => '35', 'Extra' => 'Using where', }; > Rows estimation is twice bigger than it should be, not an order, but anyway. It's a known property of InnoDB. It shouldn't reach order-of-magnitude differences.
[11 Sep 2008 20:46]
Sergey Petrunya
The issues raised in this bug are valid, but none of them is a "bug" in a sense that they do not describe something which was expected to be implemented in released versions but wasn't. Changing status to ToBeFixedLater. Ruslan, thanks for the input.
[18 Jul 2012 17:21]
Trey Raymond
Was about to file a bug for this, but found this by searching...the notes I typed up and simple use case may be useful in the fix: When referenced in a join (by the first column), a compound index is not able to be used for (col1 equality,col2 range), despite having the data in the key structured to easily do so...and being able to use it for (col1 equality,col2 equality)...or use it on a single table query. See example below Simple example case...a table of server farms, and a log of when an arbitrary event happens on one of them. A report query (the one in question) wants to find the counts of log events per farm for a given time range. Note that you'll have to replace the times I used with whatever you inserted, or just manually specify on the inserts. CREATE TABLE farm ( id tinyint unsigned NOT NULL AUTO_INCREMENT, name varchar(50) NOT NULL, PRIMARY KEY (id), UNIQUE KEY name (name) ) ENGINE=InnoDB; CREATE TABLE log ( id int unsigned NOT NULL AUTO_INCREMENT, farm_id tinyint unsigned NOT NULL, log_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id), KEY (farm_id,log_time) ) ENGINE=InnoDB; insert into farm values (1,'test farm 1'),(2,'test farm 2'); insert into log (farm_id) select id from farm; select sleep(1); insert into log (farm_id) select farm_id from log; select sleep(1); insert into log (farm_id) select farm_id from log; select sleep(1); insert into log (farm_id) select farm_id from log; select sleep(1); insert into log (farm_id) select farm_id from log; select sleep(1); insert into log (farm_id) select farm_id from log; select sleep(1); insert into log (farm_id) select farm_id from log; select sleep(1); insert into log (farm_id) select farm_id from log; select sleep(1); insert into log (farm_id) select farm_id from log; select sleep(1); insert into log (farm_id) select farm_id from log; select sleep(1); insert into log (farm_id) select farm_id from log; select sleep(1); insert into log (farm_id) select farm_id from log; select sleep(1); insert into log (farm_id) select farm_id from log; select sleep(1); insert into log (farm_id) select farm_id from log; select log.farm_id,count(*) from log where log.farm_id = 1 and log.log_time between '2012-07-18 16:58:17' and '2012-07-18 16:58:18' group by log.farm_id; 1 SIMPLE log range farm_id farm_id 5 NULL 1 Using where; Using index -- expected behavior, uses (equal,range) when using single table select farm.id,count(*) from farm left join log on log.farm_id = farm.id and log.log_time = '2012-07-18 16:58:17' where farm.id=1 group by farm.id; 1 SIMPLE farm const PRIMARY PRIMARY 1 const 1 Using index 1 SIMPLE log ref farm_id farm_id 5 const,const 256 Using index -- expected behavior, dual const (equal,equal) lookup on compound key select farm.id,count(*) from farm left join log on log.farm_id = farm.id and log.log_time between '2012-07-18 16:58:17' and '2012-07-18 16:58:18' where farm.id=1 group by farm.id; 1 SIMPLE farm const PRIMARY PRIMARY 1 const 1 Using index 1 SIMPLE log range farm_id farm_id 5 NULL 767 Using where; Using index -- expected behavior, when farm_id is provided it uses the range correctly as (equal,range), so it knows the join is on that equality select farm.id,count(*) from farm left join log on log.farm_id = farm.id and log.log_time = '2012-07-18 16:58:17' group by farm.id; 1 SIMPLE farm index NULL PRIMARY 1 NULL 1 Using index 1 SIMPLE log ref farm_id farm_id 5 test.farm.id,const 428 Using index -- expected behavior, see how it uses that compound key to do an equality on ref'd farm_id and then a const for the time provided, (equal,equal) in index order select farm.id,count(*) from farm left join log on log.farm_id = farm.id and log.log_time between '2012-07-18 16:58:17' and '2012-07-18 16:58:18' group by farm.id; 1 SIMPLE farm index NULL PRIMARY 1 NULL 1 Using index 1 SIMPLE log ref farm_id farm_id 1 test.farm.id 8572 Using index -- unexpected! where in the previous query it was able to consider the ref'd column an equality and use columns further down the index for equalities like the (equal,range) scans above, in this case it does not consider it so - usually you can have a key on (col1,col2) and do an equality on col1 and a range on col2, and let it use the index for it...and the above query would imply that mysql knows this...but it throws it away with a range on the second column and only scans by the first ('rows' is the same when the time condition is removed, key_len=1 instead of 5, and performance/rows examined proves this in my much larger use case)
[18 Jul 2012 17:24]
Trey Raymond
Note that this might also be able to optimize queries with a key on (col1,col2) that do "select foo from bar where col2>range_condition group by col1" to use the key - the data is there, structured in order. How feasible is this? I can see great use cases for it. In reference to my above example, something like this: select farm_id,count(*) from log where log_time between '2012-07-18 16:58:17' and '2012-07-18 16:58:18' group by farm_id; 1 SIMPLE log index NULL farm_id 5 NULL 17144 Using where; Using index -- does a full index scan and doesn't use the range condition while it of course could as the group by is being optimized by the key
[18 Jul 2012 17:24]
Trey Raymond
Also, to note, still exists in 5.5.17