Bug #105644 ranges cannot use indexes on joined tables
Submitted: 19 Nov 2021 16:22 Modified: 22 Nov 2021 13:15
Reporter: Domas Mituzas Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[19 Nov 2021 16:22] Domas Mituzas
Description:
If a table is joined with a range condition on an indexed field, index will not be used to satisfy it, leading to orders of magnitude slower query execution.

You can see that if my query (which returns same data in both cases) uses range, it is much much slower:

mysql> explain analyze select straight_join count(*) from ints join j2 using (a) where b=0 and a<100 and c=99 \G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)  (actual time=6.841..6.841 rows=1 loops=1)
    -> Nested loop inner join  (cost=35.47 rows=64) (actual time=0.100..6.059 rows=9899 loops=1)
        -> Filter: (ints.a < 100)  (cost=13.07 rows=64) (actual time=0.026..0.057 rows=64 loops=1)
            -> Index range scan on ints using PRIMARY  (cost=13.07 rows=64) (actual time=0.025..0.046 rows=64 loops=1)
        -> Index lookup on j2 using good (a=ints.a, b=0, c=99)  (cost=0.25 rows=1) (actual time=0.003..0.080 rows=155 loops=64)

1 row in set (0.03 sec)

mysql> explain analyze select straight_join count(*) from ints join j2 using (a) where b=0 and a<100 and c>=99 \G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)  (actual time=445.597..445.597 rows=1 loops=1)
    -> Nested loop inner join  (cost=35.47 rows=64) (actual time=437.800..444.811 rows=9899 loops=1)
        -> Filter: (ints.a < 100)  (cost=13.07 rows=64) (actual time=0.028..0.080 rows=64 loops=1)
            -> Index range scan on ints using PRIMARY  (cost=13.07 rows=64) (actual time=0.027..0.062 rows=64 loops=1)
        -> Filter: (j2.c >= 99)  (cost=0.25 rows=1) (actual time=6.842..6.935 rows=155 loops=64)
            -> Index lookup on j2 using good (a=ints.a, b=0)  (cost=0.25 rows=1) (actual time=0.004..5.869 rows=12375 loops=64)

1 row in set (0.47 sec)

How to repeat:
CREATE TABLE `j2` (
  `x` int NOT NULL AUTO_INCREMENT,
  `a` int DEFAULT NULL,
  `b` int DEFAULT NULL,
  `c` int DEFAULT NULL,
  PRIMARY KEY (`x`),
  KEY `good` (`a`,`b`,`c`),
  KEY `bad` (`a`,`c`)
) ENGINE=InnoDB

for each (a) populate data with (b=1 at 99%, b=0 at 1%, c at [1..100])

Suggested fix:
use indexes for ranges in joins
[19 Nov 2021 16:44] Domas Mituzas
This works on Postgres:

test=# explain select count(*) from generate_series(1,1000)a join j using (a) where b=1 and c>900;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Aggregate  (cost=9063.76..9063.77 rows=1 width=8)
   ->  Nested Loop  (cost=0.43..9059.00 rows=1903 width=0)
         ->  Function Scan on generate_series a  (cost=0.00..10.00 rows=1000 width=4)
         ->  Index Only Scan using idx on j  (cost=0.42..9.03 rows=2 width=4)
               Index Cond: ((a = a.a) AND (b = 1) AND (c > 900))
[22 Nov 2021 13:15] MySQL Verification Team
Hi Mr. Mituzas,

Thank you for your bug report.

We were able to repeat the performance problem that you are reporting.

We are verifying this report as a Performance Improvement report.

Verified as reported.