Bug #120773 Hash join misses DISTINCT ORDER BY LIMIT early termination
Submitted: 25 Jun 11:54 Modified: 26 Jun 10:38
Reporter: ZHAOYANG ZHANG Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:9.7.1 OS:Ubuntu (Ubuntu (22.04.4 LTS))
Assigned to: CPU Architecture:x86 (Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz)
Tags: distinct, hash-join, limit, NO_BNL, Optimizer, order-by, performance

[25 Jun 11:54] ZHAOYANG ZHANG
Description:
In MySQL 9.7.1, the optimizer chooses a much slower hash-join plan for a simple DISTINCT ORDER BY LIMIT query over a Cartesian product.

The query projects only t0.c0 and orders by the same column:

SELECT DISTINCT t0.c0 AS ref0
FROM t2, t0, t1
ORDER BY t0.c0 DESC
LIMIT 1;

There are no join predicates and no WHERE predicates. Since the selected and ordered column comes only from t0, and t1/t2 do not contribute to the projected value, t1 and t2 only need to establish that at least one joined row exists.

With the default optimizer settings, MySQL chooses a hash-join plan over the Cartesian product. The plan materializes all joined rows, performs temporary-table deduplication, sorts the distinct values, and then applies LIMIT 1. In my testcase, this plan produced 141659 joined rows before deduplication:

t2 rows = 49
t0 rows = 59
t1 rows = 49
49 * 59 * 49 = 141659

The default plan has the following shape:

Inner hash join (no condition)
Temporary table with deduplication
Sort: t0.c0 DESC
Limit: 1

However, when hash join is disabled, either by using NO_BNL(t2, t0, t1) or by setting:

SET SESSION optimizer_switch = 'default,block_nested_loop=off';

the optimizer chooses a much cheaper ordered-index plan. The alternative plan scans t0 using a covering index in reverse order, stops after one unique row, and then checks only one row from t2 and one row from t1.

The good plan has the following shape:

Covering index scan on t0 using i2 (reverse)
Limit table size: 1 unique row(s)
Nested loop inner join

The default query, the NO_BNL query, and the block_nested_loop=off query return the same result. However, their execution times are very different. In repeated runs, the default hash-join plan usually took about 25-35 ms, while the NO_BNL/hash-join-disabled plan usually took about 0.2 ms. Thus, the ordered-index early-termination plan was roughly 100x faster than the default hash-join plan in this testcase. The alternative plan also had a much lower estimated cost.

I also tested whether this is caused by optimizer pruning, join-order search depth, or prefer_ordering_index. The bad hash-join plan still appears with:

optimizer_prune_level = 0
optimizer_search_depth = 0 or optimizer_search_depth = 3
prefer_ordering_index = on
prefer_ordering_index = off

In contrast, disabling hash join through block_nested_loop=off makes the optimizer choose the reverse ordered-index plan with DISTINCT/LIMIT early termination.

This suggests that the issue is not caused by insufficient join-order search depth, ordinary optimizer pruning, or prefer_ordering_index being disabled. Instead, when hash join is enabled, the hash-join planning path appears to prevent or fail to preserve the cheaper ordered-index DISTINCT/ORDER BY/LIMIT early-termination plan.

This issue appears related to Bug #114512 and Bug #115679, because those reports also involve slower hash-join/BNL plans. However, this testcase additionally involves DISTINCT ORDER BY LIMIT over a Cartesian product, where the non-projected tables can be treated as existence checks after the first ordered distinct value is found.

The testcase was reproduced with the following session settings:

@@version = 9.7.1
@@optimizer_prune_level = 1
@@optimizer_search_depth = 62
@@join_buffer_size = 262144

The optimizer_switch value was:

index_merge=on,
index_merge_union=on,
index_merge_sort_union=on,
index_merge_intersection=on,
engine_condition_pushdown=on,
index_condition_pushdown=on,
mrr=on,
mrr_cost_based=on,
block_nested_loop=on,
batched_key_access=off,
materialization=on,
semijoin=on,
loosescan=on,
firstmatch=on,
duplicateweedout=on,
subquery_materialization_cost_based=on,
use_index_extensions=on,
condition_fanout_filter=on,
derived_merge=on,
use_invisible_indexes=off,
skip_scan=on,
hash_join=on,
subquery_to_derived=off,
prefer_ordering_index=on,
hypergraph_optimizer=off,
derived_condition_pushdown=on,
hash_set_operations=on

In particular, block_nested_loop is enabled by default in this configuration, so hash join is available. prefer_ordering_index is also enabled by default, but the optimizer still chooses the slower hash-join plan instead of the ordered-index DISTINCT/ORDER BY/LIMIT early-termination plan.

How to repeat:
1. Load the attached testcase.

mysql < create_tables_insert_data.sql

2. Check the MySQL version and optimizer settings.

SELECT @@version;
SELECT @@optimizer_prune_level;
SELECT @@optimizer_search_depth;
SELECT @@optimizer_switch;
SELECT @@join_buffer_size;

3. Refresh statistics.

ANALYZE TABLE t0, t1, t2;

4. Run the query with default optimizer settings.

SET SESSION optimizer_prune_level = DEFAULT;
SET SESSION optimizer_search_depth = DEFAULT;
SET SESSION optimizer_switch = 'default';

EXPLAIN ANALYZE
SELECT DISTINCT t0.c0 AS ref0
FROM t2, t0, t1
ORDER BY t0.c0 DESC
LIMIT 1;

Observed result:

The optimizer chooses a hash-join plan. The plan materializes the Cartesian product, performs temporary-table deduplication, sorts by t0.c0 DESC, and then applies LIMIT 1.

5. Run the same query with hash join disabled through block_nested_loop=off.

SET SESSION optimizer_prune_level = DEFAULT;
SET SESSION optimizer_search_depth = DEFAULT;
SET SESSION optimizer_switch = 'default,block_nested_loop=off';

EXPLAIN ANALYZE
SELECT DISTINCT t0.c0 AS ref0
FROM t2, t0, t1
ORDER BY t0.c0 DESC
LIMIT 1;

Observed result:

The optimizer chooses a reverse ordered-index scan on t0 and stops after one unique row:

Covering index scan on t0 using i2 (reverse)
Limit table size: 1 unique row(s)

This plan is much faster than the default hash-join plan.

6. Run the same query with the NO_BNL hint.

SET SESSION optimizer_prune_level = DEFAULT;
SET SESSION optimizer_search_depth = DEFAULT;
SET SESSION optimizer_switch = 'default';

EXPLAIN ANALYZE
SELECT /*+ NO_BNL(t2, t0, t1) */ DISTINCT t0.c0 AS ref0
FROM t2, t0, t1
ORDER BY t0.c0 DESC
LIMIT 1;

Observed result:

The optimizer again chooses the faster ordered-index early-termination plan instead of the default hash-join plan.

7. Test whether the issue is caused by pruning or search depth.

SET SESSION optimizer_prune_level = 0;
SET SESSION optimizer_search_depth = 0;
SET SESSION optimizer_switch = 'default';

EXPLAIN ANALYZE
SELECT DISTINCT t0.c0 AS ref0
FROM t2, t0, t1
ORDER BY t0.c0 DESC
LIMIT 1;

SET SESSION optimizer_prune_level = 0;
SET SESSION optimizer_search_depth = 3;
SET SESSION optimizer_switch = 'default';

EXPLAIN ANALYZE
SELECT DISTINCT t0.c0 AS ref0
FROM t2, t0, t1
ORDER BY t0.c0 DESC
LIMIT 1;

Observed result:

Both configurations still choose the bad hash-join plan.

8. Test whether prefer_ordering_index changes the result.

SET SESSION optimizer_prune_level = DEFAULT;
SET SESSION optimizer_search_depth = DEFAULT;
SET SESSION optimizer_switch = 'default,prefer_ordering_index=on';

EXPLAIN ANALYZE
SELECT DISTINCT t0.c0 AS ref0
FROM t2, t0, t1
ORDER BY t0.c0 DESC
LIMIT 1;

SET SESSION optimizer_prune_level = DEFAULT;
SET SESSION optimizer_search_depth = DEFAULT;
SET SESSION optimizer_switch = 'default,prefer_ordering_index=off';

EXPLAIN ANALYZE
SELECT DISTINCT t0.c0 AS ref0
FROM t2, t0, t1
ORDER BY t0.c0 DESC
LIMIT 1;

Observed result:

Both configurations still choose the bad hash-join plan.

9. Check result equivalence.

SET SESSION optimizer_prune_level = DEFAULT;
SET SESSION optimizer_search_depth = DEFAULT;
SET SESSION optimizer_switch = 'default';

SELECT DISTINCT t0.c0 AS ref0
FROM t2, t0, t1
ORDER BY t0.c0 DESC
LIMIT 1;

SELECT /*+ NO_BNL(t2, t0, t1) */ DISTINCT t0.c0 AS ref0
FROM t2, t0, t1
ORDER BY t0.c0 DESC
LIMIT 1;

SET SESSION optimizer_switch = 'default,block_nested_loop=off';

SELECT DISTINCT t0.c0 AS ref0
FROM t2, t0, t1
ORDER BY t0.c0 DESC
LIMIT 1;

Summary:

The bad plan persists when optimizer pruning is disabled, when search depth is changed, and regardless of prefer_ordering_index. The good ordered-index early-termination plan appears when hash join is disabled with block_nested_loop=off or when NO_BNL(t2, t0, t1) is used.

Suggested fix:
The optimizer should consider and preserve an ordered-index access path with early termination for DISTINCT ORDER BY LIMIT queries, even when hash join is available.

In this testcase, the selected and ordered column is from t0 only. The other joined tables do not contribute to the projected value and only need to establish that at least one joined row exists. Therefore, an access path that scans t0.c0 in descending index order, stops after one distinct value, and then checks one row from t1 and t2 should be considered and costed against the hash-join plan.

The hash-join planning path appears to prevent or fail to preserve this ordered-index DISTINCT/ORDER BY/LIMIT alternative. As a result, the optimizer chooses a plan that materializes the Cartesian product, performs temporary-table deduplication, sorts the result, and only then applies LIMIT 1.
[25 Jun 11:56] ZHAOYANG ZHANG
Reproduction package containing create_tables_insert_data.sql and run_optimizer_repro_cases.sql.

Attachment: mysql971-hash-join-distinct-order-by-limit-repro.zip.zip (application/x-zip-compressed, text), 4.88 KiB.

[26 Jun 10:38] Chaithra Marsur Gopala Reddy
Hi ZHAOYANG ZHANG,

Thank you for the test case. Verified as described.