Description:
When a query contains a correlated subquery, the `EXPLAIN FORMAT=TREE` output only shows the cost of executing the subquery once. However, it does not account for the cumulative cost of executing the subquery multiple times within the query. While MySQL does invoke the `accumulate_statement_cost` function to add the total cost of executing the subquery to the `current_query_cost`, this accumulation is not reflected in the `EXPLAIN` tree, which is confusing to users.
How to repeat:
drop table if exists t1;
CREATE TABLE `t1` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`c1` varchar(15),
`c2`int,
`c3` varchar(20),
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`)
)ENGINE=INNODB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
set @x:=0;
insert into t1 (c1,c2,c3) select lpad(mod(@x=@x+1,90000),15,'0'),1,lpad('0',20,'0') from (select 1 from information_schema.COLUMNS limit 1800) a,(select 1 from information_schema.COLUMNS limit 1800) b;
drop table if exists t2;
CREATE TABLE t2 (
id bigint unsigned NOT NULL AUTO_INCREMENT,
c1 varchar(15),
c2 int,
c3 varchar(20),
PRIMARY KEY (id),
UNIQUE KEY `idx_c1`(c1)
)ENGINE=INNODB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
set @x:=0;
insert into t2 (c1,c2,c3) select lpad(@x:=@x+1,15,'0'),1,lpad('0',20,'0') from (select 1 from information_schema.columns limit 300) a, (select 1 from information_schema.columns limit 300) b;
analyze table tl;
analyze table t2;
explain format=tree select * from t1 where exists (select /*+ NO_SEMIJOIN() */ 1 from t2 where t2.x=t1.x) and id>2000000 order by id limit 5;
| -> Limit: 5 row(s) (cost=301514 rows=5)
-> Filter: (exists(select #2) and (t1.id > 2000000)) (cost=301514 rows=1.5e+6)
-> Index range scan on t1 using PRIMARY over (2000000 < id) (cost=301514 rows=1.5e+6)
-> Select #2 (subquery in condition; dependent)
-> Limit: 1 row(s) (cost=1.1 rows=1)
-> Single-row covering index lookup on t2 using uk1 (x=t1.x) (cost=1.1 rows=1)
|
Based on the actual cost of the statement, the output of EXPLAIN should appear as follows:
| -> Limit: 5 row(s) (cost=1.95e+6 rows=5)
-> Filter: (exists(select #2) and (t1.id > 2000000)) (cost=1.95e+6 rows=1.5e+6)
-> Index range scan on t1 using PRIMARY over (2000000 < id) (cost=301514 rows=1.5e+6)
-> Select #2 (subquery in condition; dependent)
-> Limit: 1 row(s) (cost=1.1 rows=1)
-> Single-row covering index lookup on t2 using uk1 (x=t1.x) (cost=1.1 rows=1)
|
Suggested fix:
Perform a depth-first traversal, starting from the root access path. When encountering a Filter access path, check for the presence of a correlated subquery. If a correlated subquery is found, update the cost for all paths from the Filter node upwards.