Bug #115840 Contribute by Tencent:Incorrect Cost Display for Join with Materialized Subquery
Submitted: 15 Aug 2024 2:30 Modified: 15 Aug 2024 6:54
Reporter: jarne yang Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.39, 8.4.2, 9.0.1 OS:Any
Assigned to: CPU Architecture:Any
Tags: explain, Optimizer bug

[15 Aug 2024 2:30] jarne yang
Description:
When setting the cost on a nested loop join access path, if a materialization strategy is employed for the subquery, the calculation of rows and cost becomes incorrect when joining the materialized table.
Even when the join type is `eq_ref`, the calculation for the number of rows after join is simply a multiplication of the cardinalities of the two tables. This leads to an inaccurate rows and cost display in the explain tree, contradicting the status `Last_query_cost`.

How to repeat:
set optimizer_switch='firstmatch=off';
CREATE TABLE t1 ( a INTEGER NOT NULL, b INTEGER NOT NULL );
INSERT INTO t1 VALUES (1,1), (2,2), (3,3);
ANALYZE TABLE t1;

EXPLAIN FORMAT=tree SELECT a FROM t1 WHERE a NOT IN ( SELECT b FROM t1 WHERE b > 2 );

EXPLAIN
-> Nested loop antijoin  (cost=1.75 rows=9)
    -> Table scan on t1  (cost=0.55 rows=3)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (b=t1.a)  (cost=0.983..0.983 rows=1)
        -> Materialize with deduplication  (cost=0.85..0.85 rows=3)
            -> Filter: (t1.b is not null)  (cost=0.55 rows=3)
                -> Filter: (t1.b > 2)  (cost=0.55 rows=3)
                    -> Table scan on t1  (cost=0.55 rows=3)

Since the table t1 only has 3 rows, the expected rows (rows=9) showing on the top of explain tree are impossible. And the cost (cost=1.75) also contradicts the status `Last_query_cost` which is 2.699. A reasonable explain tree should appear as follows: 

EXPLAIN
-> Nested loop antijoin  (cost=2.699 rows=3)
    -> Table scan on t1  (cost=0.55 rows=3)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (b=t1.a)  (cost=0.983..0.983 rows=1)
        -> Materialize with deduplication  (cost=0.85..0.85 rows=3)
            -> Filter: (t1.b is not null)  (cost=0.55 rows=3)
                -> Filter: (t1.b > 2)  (cost=0.55 rows=3)
                    -> Table scan on t1  (cost=0.55 rows=3)
[15 Aug 2024 6:54] MySQL Verification Team
Hello jarne yang,

Thank you for the report and feedback.

regards,
Umesh
[26 Aug 2024 7:55] jarne yang
Bug #115840 fix for 8.0.37

Attachment: bug115840-8.0.37.patch (application/octet-stream, text), 5.81 KiB.

[26 Aug 2024 7:58] MySQL Verification Team
Hello jarne yang,

Thank you very much for your patch contribution, we appreciate it!

In order for us to continue the process of reviewing your contribution to MySQL, please send us a signed copy of the Oracle Contributor Agreement (OCA) as outlined in https://oca.opensource.oracle.com

Signing an OCA needs to be done only once and it's valid for all other Oracle governed Open Source projects as well.

Getting a signed/approved OCA on file will help us facilitate your contribution - this one, and others in the future.  

Please let me know, if you have any questions.

Thank you for your interest in MySQL.

regards,
Umesh