Bug #46717 | Unnecessary joins are not optimized away | ||
---|---|---|---|
Submitted: | 14 Aug 2009 11:14 | Modified: | 24 Aug 2009 4:50 |
Reporter: | Robin Sillem | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.1.32 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[14 Aug 2009 11:14]
Robin Sillem
[17 Aug 2009 7:21]
Susanne Ebrecht
Many thanks for writing a bug report. This is not a bug in my eyes. Consider, there is no artificial intelligence .... Create table Maternity.Babies ( Id INT PRIMARY KEY, ForeName varchar(20) NULL ) ENGINE=INNODB; create table Maternity.Weights ( BabyId integer not null, AgeInDays integer not null, Weight float not null, CONSTRAINT Baby_Fk FOREIGN KEY (BabyId) REFERENCES Babies (Id) ) ENGINE=INNODB; No primary key on table "weights". Look to the constraint: FOREIGN KEY (BabyId) REFERENCES Babies (Id) For the system this is 1:N relation. From where should the system know that you want 1:1 here? Also let us do an example: Baby id = 5 It has the following weights: after birth: 3.5 lb after 7 days: 7 lb after a year: 20 lb so, when you do: SELECT * FROM weight; Result is: id | AgeInDays | Weight ________________________ 5 | 0 | 3.5 5 | 7 | 7.0 5 | 365 | 20.0 3 rows for Baby 5. So system just is not able to optimise away the join.
[17 Aug 2009 7:48]
Robin Sillem
Susanne, Thanks for the swift response. Perhaps I wasn't clear enough in what I was requesting. It's not a question of whether it's a 1:1 or 1:N relationship so much as the fact that the relationship is not used at all. In the example you cite: SELECT * FROM weights WHERE BabyId = 5; Result is: id | AgeInDays | Weight ________________________ 5 | 0 | 3.5 5 | 7 | 7.0 5 | 365 | 20.0 all the returned data comes from weights and there is no need to reference babies in the execution plan, even if the query was written with the unnecessary join to babies, as: SELECT w.* FROM weights w JOIN babies b ON w.BabyId = b.Id WHERE w.BabyId = 5 The fact that the schema constraints mandate that each row of weights must refer to exactly 1 baby means that the formulation with the join will return the same number of rows as the formulation without, means that the join can be dropped. This requires no domain knowledge or artificial intelligence, as all the conditions for dropping the unnecessary join can be discovered from the query and the schema. My request is that the query optimizer should detect these situations, and construct an execution plan that does not use tables which do not affect the result set in any way. I would submit that, in the context of the query optimizer, this is either a bug (because the execution plan is clearly sub-optimal), or at the least a feature request.
[18 Aug 2009 9:49]
Robin Sillem
To confirm that this optimization is possible, I've set up the same tests using MS SQL Server 2005, which removes the reference to Babies from the execution plan as I have described.
[24 Aug 2009 4:50]
Susanne Ebrecht
Ahhh now I understand you. Many thanks for writing a bug report. Verified as described.
[2 Dec 2009 21:27]
Chip Yamasaki
I'd like to add that this is not strictly related to foreign keys as in the previous example. For example: create table tbl1 (fld1 varchar(10) not null primary key); create table tbl2 (fld1 varchar(10) not null primary key); explain select tbl1.* from tbl1 left outer join tbl2 on tbl1.fld1 = tbl2.fld1; mysql should be able to optimize away tbl2 and doesn't. Either a primary key or a unique constraint on the secondary side of an outer join should qualify.