Bug #120797 Poor plan for DISTINCT Cartesian product: hash join materializes full product while NO_BNL is about 50x faster
Submitted: 27 Jun 15:18 Modified: 2 Jul 13:29
Reporter: ZHAOYANG ZHANG Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:9.7.1 OS:Ubuntu (22.04.4 LTS)
Assigned to: CPU Architecture:x86
Tags: cartesian product, cost, distinct, hash join, NO_BNL, Optimizer hints

[27 Jun 15:18] ZHAOYANG ZHANG
Description:
On MySQL 9.7.1, the optimizer chooses a much slower hash join plan for a DISTINCT query over a Cartesian product.

The query is:

SELECT DISTINCTROW t1.c4 AS ref0
FROM t2, t0, t1;

Only t1.c4 is projected. No columns from t0 or t2 are used in the SELECT list, WHERE clause, GROUP BY, or ORDER BY. Therefore, as long as t0 and t2 are non-empty, the result is determined by the distinct values of t1.c4. The unused tables should only need to be checked for the existence of at least one row.

However, the default plan uses hash joins for the Cartesian product and materializes the full product before duplicate elimination. In my test case, it produces about 1,067,175 intermediate rows and then deduplicates them to 45 rows. The estimated cost is 366651.229945 and EXPLAIN ANALYZE takes about 0.428 sec.

When I add the hint NO_BNL(t2, t0, t1), the optimizer chooses a nested-loop plan. This plan uses "Limit: 1 row(s)" on t2 and avoids multiplying by all rows of t2. It produces only 6885 intermediate rows before duplicate elimination. The estimated cost drops to 4801.818030 and EXPLAIN ANALYZE takes about 0.013 sec.

Repeated runtime validation also shows a large difference:

Default query average runtime: 412.139743 ms
Query with NO_BNL average runtime: 8.180291 ms
Runtime speedup: about 50.38x
Results are equivalent.

A logically equivalent EXISTS rewrite is even cheaper:

SELECT DISTINCTROW t1.c4 AS ref0
FROM t1
WHERE EXISTS (SELECT 1 FROM t0 LIMIT 1)
AND EXISTS (SELECT 1 FROM t2 LIMIT 1);

This plan scans t1 and checks t0/t2 existence with FirstMatch/Limit 1. It returns the same 45 rows and takes about 0.001 sec in EXPLAIN ANALYZE.

This suggests that the problem is not just a runtime fluctuation caused by the hint. The default plan appears to miss an early-out opportunity for unused joined tables under DISTINCT. Since only t1.c4 is needed in the final result, t0 and t2 only need to be checked for row existence. However, the hash-join plan materializes the full Cartesian product before duplicate elimination, while the NO_BNL plan can stop after the first row from one of the unused tables.

I found Bug #115679 when searching for similar reports. That report is related because it also involves a DISTINCT query where the default plan uses a hash join and NO_BNL leads to a faster nested-loop plan. However, this is a separate reproducible test case on MySQL 9.7.1: it uses a three-table Cartesian product, has no GROUP BY, and shows that an unused joined table can be accessed with an early-out "Limit: 1 row(s)" plan instead of being fully multiplied into the Cartesian product.

I also previously reported Bug #120773, which involved hash join missing early termination for a DISTINCT query with ORDER BY and LIMIT. The present case is different because there is no ORDER BY and no LIMIT in the query. The issue here is that hash join is chosen for a DISTINCT Cartesian product even though the extra joined tables are unused and could be handled as existence checks. Therefore, this appears to be a broader DISTINCT/hash-join early-out costing or transformation issue, not limited to ORDER BY/LIMIT queries.

How to repeat:
How to repeat:

1. Start a MySQL 9.7.1 server.

2. Load the attached test case file:

mysql -uroot -p < create_tables_insert_data_filtered.sql

The attached file creates the test tables t0, t1, and t2, inserts the test data, and runs ANALYZE TABLE on the involved tables.

3. Run the original query with EXPLAIN ANALYZE:

EXPLAIN ANALYZE
SELECT DISTINCTROW t1.c4 AS ref0
FROM t2, t0, t1;

Observed plan:

The optimizer chooses hash joins for the Cartesian product. The plan contains:

* Temporary table with deduplication
* Inner hash join (no condition)
* Another Inner hash join (no condition)

In my run, the plan produced about 1,067,175 intermediate rows before duplicate elimination, returned 45 rows, and EXPLAIN ANALYZE took about 0.448 sec.

4. Run the same query with NO_BNL:

EXPLAIN ANALYZE
SELECT /*+ NO_BNL(t2, t0, t1) */ DISTINCTROW t1.c4 AS ref0
FROM t2, t0, t1;

Observed plan:

The optimizer chooses a nested-loop plan. The plan contains:

* Temporary table with deduplication
* Nested loop inner join
* Limit: 1 row(s) on t2

In my run, this plan produced only 6885 intermediate rows before duplicate elimination, returned the same 45 rows, and EXPLAIN ANALYZE took about 0.012 sec.

5. Run the following logically equivalent EXISTS rewrite:

EXPLAIN ANALYZE
SELECT DISTINCTROW t1.c4 AS ref0
FROM t1
WHERE EXISTS (SELECT 1 FROM t0 LIMIT 1)
AND EXISTS (SELECT 1 FROM t2 LIMIT 1);

Observed plan:

The plan scans t1 and checks row existence in t0 and t2. It contains:

* Temporary table with deduplication
* Hash semijoin (FirstMatch)
* Limit: 1 row(s) on t0
* Limit: 1 row(s) on t2

In my run, this query also returned 45 rows and EXPLAIN ANALYZE took about 0.001 sec.

Expected behavior:

The optimizer should not choose the default hash-join plan that materializes the full Cartesian product before duplicate elimination. Since the query projects only t1.c4 and does not use columns from t0 or t2, the extra joined tables only need to be checked for existence. A plan similar to the NO_BNL plan or the EXISTS rewrite should be considered/costed.

Suggested fix:
The optimizer could take the DISTINCT duplicate-elimination property into account when costing join methods for Cartesian products involving unused joined tables.

For a DISTINCT query where some joined tables do not contribute any columns to the SELECT list, predicates, grouping, or ordering, those tables may only need to be checked for row existence. In this test case, since only t1.c4 is needed in the final result, t0 and t2 can be treated as existence checks as long as they are non-empty.

One possible approach is to consider a FirstMatch/Limit-1 style access path, or an internal semijoin/EXISTS-style transformation, for such unused tables before choosing a hash join that materializes the full Cartesian product. The optimizer already produces this kind of plan for the equivalent EXISTS rewrite, and the NO_BNL plan also shows an early-out "Limit: 1 row(s)" access for one of the unused tables.

The cost model should also account for the cost of producing and deduplicating the full Cartesian product when comparing hash join plans against nested-loop or semijoin-style plans with early-out. In this case, the default hash-join plan has both much higher estimated cost and much worse actual runtime than the hinted plan, which suggests that this optimization opportunity is either not considered or not costed favorably.
[27 Jun 15:18] ZHAOYANG ZHANG
Reproducible SQL test case with schema, data, and ANALYZE TABLE statements.

Attachment: create_tables_insert_data.sql (application/octet-stream, text), 13.06 KiB.

[2 Jul 13:29] Chaithra Marsur Gopala Reddy
Hi ZHAOYANG ZHANG,

Thank you for the test case. Verified as described.