MySQL Bugs: #101985: cost planner doesn't know when DuplicateWeedout Strategy creates disk table

Bug #101985	cost planner doesn't know when DuplicateWeedout Strategy creates disk table
Submitted:	13 Dec 2020 15:03	Modified:	14 Dec 2020 14:11
Reporter:	Vinieth S S	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	Mysql 5.7	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Select table1.id 
from table1 
where table.id in (select table2.id 
                    from table2 
                    where table2.id in (select table3.id 
                                        from table3)
                    ) 
order by table1.id  
limit 100

On checking the optimiser trace for the above query. Optimiser trace cost

DUPLICATE-WEEDOUT strategy - Cost: 1.08e7
FIRST MATCH strategy - Cost: 1.85e7
As DUPLICATE-WEEDOUT cost is less, mysql took DUPLICATE-WEEDOUT strategy for the above query.

Seems everything good in join_optimization part right. But finally, after checking the join_execution part. DUPLICATE-WEEDOUT usually creates temp table. But here as the heap-size is not enough for temp table, it went on creating ondisk temp table(converting_tmp_table_to_ondisk).

Due to disk temp table my query execution became slower.

So what happened here?

Optimiser trace doesn't calculate the cost of disk table in join-optimisation part itself. If disk table cost was calculated, it would be higher than first match. Then final_semijoin_strategy would be FIRST-MATCH strategy, with this my query would have been faster.

How to repeat:
The query must use DUPLICATE-WEEDOUT strategy and create ondisk table for that query

Suggested fix:
MYSQL need to calculate the cost of on-disk table in join-optimisation part itself

Hi Mr. S S,

Thank you for your bug report.

However, this is not a bug.

Simply, there are two reasons why it is not a bug. First of all, in the optimising stage, our server does not know how big will the temporary table be. It is known only at the moment when its size becomes larger then the limit for the in-memory temporary tables.

Second, it is possible to change this limit, so even that would require constant reading of the variables. Third, you can work around this problem by setting the session limit before running the query and resetting it thereafter.

Not a bug.

Hi Mr. S S,

On second thought, your idea could be a good feature request.

However, you filed this as a bug. Would you agree to make it a fully verified feature request for 8.0 ????