Description:
The optimizer appears to choose a much slower join order, even though an equivalent JOIN_PREFIX/JOIN_ORDER plan has a lower estimated cost and is thousands of times faster.
The query joins three tables:
t2 STRAIGHT_JOIN t1 ON t2.c1 = t1.c1
INNER JOIN t0 ON t1.c0 = t0.c4
WHERE t1.c4 < -1796620509
After ANALYZE TABLE, the default plan first builds a large disconnected intermediate result between t2 and t0:
Inner hash join (no condition)
t2 index scan
t0 table scan
This produces about 6.26 million rows. The plan then performs an index lookup on t1 for each row of this intermediate result:
Index lookup on t1 using i2 (c0 = t0.c4)
loops = 6.26e+6
The query returns no rows. However, the default plan only discovers this after millions of failed t1 index lookups.
With JOIN_PREFIX(t2,t1,t0), JOIN_ORDER(t2,t1,t0), or JOIN_FIXED_ORDER(), MySQL applies the selective t1 predicate first:
Filter: (t1.c4 < -1796620509)
Table scan on t1
This immediately produces zero rows, so the remaining join work is avoided.
On my machine, EXPLAIN ANALYZE is stable across repeated runs:
Default plan:
7726 ms
7689 ms
7670 ms
JOIN_PREFIX(t2,t1,t0):
2.67 ms
2.69 ms
2.63 ms
JOIN_ORDER(t2,t1,t0):
2.62 ms
JOIN_FIXED_ORDER():
2.69 ms
The default plan has a reported estimated cost of about 2.82e+6, while the JOIN_PREFIX plan has a lower reported estimated cost of about 1.86e+6. Therefore, the hinted plan is not only much faster in runtime, but also lower in MySQL's own reported cost.
This looks like a missed selective-table-first join order under a STRAIGHT_JOIN constraint. The issue is not that hash join without a condition is always wrong; the issue is that the chosen join order creates a large disconnected intermediate before applying a highly selective predicate on t1, while an equivalent lower-cost join order avoids the unnecessary work.
How to repeat:
1. Save the attached SQL file as:
mysql_straight_join_selective_filter_repro.sql
2. Run it with comments enabled, so that optimizer hints are preserved:
mysql -h 127.0.0.1 -P 3306 -u root --comments --table \
< mysql_straight_join_selective_filter_repro.sql \
> mysql_straight_join_selective_filter_repro_result.txt 2>&1
3. The script creates a database named:
mysql_join_prefix_selective_filter_repro
It creates three InnoDB tables t0, t1, and t2, creates indexes, loads scaled synthetic data, runs ANALYZE TABLE, and then compares:
CASE A: default plan
CASE B: JOIN_PREFIX(t2,t1,t0)
CONTROL C: JOIN_ORDER(t2,t1,t0)
CONTROL D: JOIN_FIXED_ORDER()
4. Observe the default EXPLAIN FORMAT=TREE output. It contains a large disconnected intermediate:
-> Nested loop inner join
-> Inner hash join (no condition)
-> Covering index scan on t2 using i5
-> Hash
-> Filter: (t0.c4 is not null)
-> Table scan on t0
-> Filter: ((t1.c1 = t2.c1) and (t1.c4 < -1796620509))
-> Index lookup on t1 using i2 (c0 = t0.c4)
5. Observe the default EXPLAIN ANALYZE output. It performs about 6.26 million t1 index lookups:
-> Inner hash join (no condition)
actual rows=6.26e+6
-> Index lookup on t1 using i2 (c0 = t0.c4)
actual rows=0 loops=6.26e+6
The default runtime is about 7.7 seconds on my machine.
6. Observe the JOIN_PREFIX(t2,t1,t0) EXPLAIN ANALYZE output. It applies the t1 predicate first:
-> Filter: (t1.c4 < -1796620509)
-> Table scan on t1
This produces zero rows, and the t0 scan is never executed. Runtime is about 2.6 ms on my machine.
7. The same fast plan shape is also obtained with JOIN_ORDER(t2,t1,t0) and JOIN_FIXED_ORDER(), which suggests that the problem is caused by the default join order rather than by a specific JOIN_PREFIX implementation.
Suggested fix:
Consider improving join order enumeration/costing for mixed STRAIGHT_JOIN and INNER JOIN queries so that selective predicates on later tables are considered before choosing a plan that creates a large disconnected intermediate result.
In this case, the optimizer should avoid choosing a t2 × t0 intermediate followed by millions of t1 index lookups when an equivalent join order that applies the selective t1 predicate first has both lower estimated cost and much lower runtime.
More generally, the optimizer could penalize disconnected intermediates when they delay highly selective single-table predicates, or ensure that lower-cost join orders such as the JOIN_PREFIX/JOIN_ORDER plan are not pruned from the search space.
Description: The optimizer appears to choose a much slower join order, even though an equivalent JOIN_PREFIX/JOIN_ORDER plan has a lower estimated cost and is thousands of times faster. The query joins three tables: t2 STRAIGHT_JOIN t1 ON t2.c1 = t1.c1 INNER JOIN t0 ON t1.c0 = t0.c4 WHERE t1.c4 < -1796620509 After ANALYZE TABLE, the default plan first builds a large disconnected intermediate result between t2 and t0: Inner hash join (no condition) t2 index scan t0 table scan This produces about 6.26 million rows. The plan then performs an index lookup on t1 for each row of this intermediate result: Index lookup on t1 using i2 (c0 = t0.c4) loops = 6.26e+6 The query returns no rows. However, the default plan only discovers this after millions of failed t1 index lookups. With JOIN_PREFIX(t2,t1,t0), JOIN_ORDER(t2,t1,t0), or JOIN_FIXED_ORDER(), MySQL applies the selective t1 predicate first: Filter: (t1.c4 < -1796620509) Table scan on t1 This immediately produces zero rows, so the remaining join work is avoided. On my machine, EXPLAIN ANALYZE is stable across repeated runs: Default plan: 7726 ms 7689 ms 7670 ms JOIN_PREFIX(t2,t1,t0): 2.67 ms 2.69 ms 2.63 ms JOIN_ORDER(t2,t1,t0): 2.62 ms JOIN_FIXED_ORDER(): 2.69 ms The default plan has a reported estimated cost of about 2.82e+6, while the JOIN_PREFIX plan has a lower reported estimated cost of about 1.86e+6. Therefore, the hinted plan is not only much faster in runtime, but also lower in MySQL's own reported cost. This looks like a missed selective-table-first join order under a STRAIGHT_JOIN constraint. The issue is not that hash join without a condition is always wrong; the issue is that the chosen join order creates a large disconnected intermediate before applying a highly selective predicate on t1, while an equivalent lower-cost join order avoids the unnecessary work. How to repeat: 1. Save the attached SQL file as: mysql_straight_join_selective_filter_repro.sql 2. Run it with comments enabled, so that optimizer hints are preserved: mysql -h 127.0.0.1 -P 3306 -u root --comments --table \ < mysql_straight_join_selective_filter_repro.sql \ > mysql_straight_join_selective_filter_repro_result.txt 2>&1 3. The script creates a database named: mysql_join_prefix_selective_filter_repro It creates three InnoDB tables t0, t1, and t2, creates indexes, loads scaled synthetic data, runs ANALYZE TABLE, and then compares: CASE A: default plan CASE B: JOIN_PREFIX(t2,t1,t0) CONTROL C: JOIN_ORDER(t2,t1,t0) CONTROL D: JOIN_FIXED_ORDER() 4. Observe the default EXPLAIN FORMAT=TREE output. It contains a large disconnected intermediate: -> Nested loop inner join -> Inner hash join (no condition) -> Covering index scan on t2 using i5 -> Hash -> Filter: (t0.c4 is not null) -> Table scan on t0 -> Filter: ((t1.c1 = t2.c1) and (t1.c4 < -1796620509)) -> Index lookup on t1 using i2 (c0 = t0.c4) 5. Observe the default EXPLAIN ANALYZE output. It performs about 6.26 million t1 index lookups: -> Inner hash join (no condition) actual rows=6.26e+6 -> Index lookup on t1 using i2 (c0 = t0.c4) actual rows=0 loops=6.26e+6 The default runtime is about 7.7 seconds on my machine. 6. Observe the JOIN_PREFIX(t2,t1,t0) EXPLAIN ANALYZE output. It applies the t1 predicate first: -> Filter: (t1.c4 < -1796620509) -> Table scan on t1 This produces zero rows, and the t0 scan is never executed. Runtime is about 2.6 ms on my machine. 7. The same fast plan shape is also obtained with JOIN_ORDER(t2,t1,t0) and JOIN_FIXED_ORDER(), which suggests that the problem is caused by the default join order rather than by a specific JOIN_PREFIX implementation. Suggested fix: Consider improving join order enumeration/costing for mixed STRAIGHT_JOIN and INNER JOIN queries so that selective predicates on later tables are considered before choosing a plan that creates a large disconnected intermediate result. In this case, the optimizer should avoid choosing a t2 × t0 intermediate followed by millions of t1 index lookups when an equivalent join order that applies the selective t1 predicate first has both lower estimated cost and much lower runtime. More generally, the optimizer could penalize disconnected intermediates when they delay highly selective single-table predicates, or ensure that lower-cost join orders such as the JOIN_PREFIX/JOIN_ORDER plan are not pruned from the search space.