Bug #116335 Multi-valued index isn't used with some range scans on composite indexes
Submitted: 10 Oct 2024 16:53 Modified: 11 Oct 2024 8:03
Reporter: Maciej Dobrzanski Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0.39 OS:Any
Assigned to: CPU Architecture:Any
Tags: Multi-Valued Indexes, Optimizer

[10 Oct 2024 16:53] Maciej Dobrzanski
Description:
In complex use cases where a multi-valued index is present in a composite key, the optimizer can sometimes ignore the multi-valued index for range scans. It will only want to use the prefix of the key up to but excluding the multi-valued index itself. This only seems to happen when there are multiple such keys in the table. The physical order of those indexes also seems to matter.

Here's an example of such table:

CREATE TABLE `mvi` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `f1` int NOT NULL,
  `f2` int NOT NULL,
  `f3` int DEFAULT NULL,
  `f4` int NOT NULL,
  `f5` int DEFAULT NULL,
  `j` json NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ix1` (`f1`,`f3`,(cast(`j` as signed array))),
  KEY `ix3` (`f1`,`f4`,`f3`,(cast(`j` as signed array))),
  KEY `ix5` (`f1`,`f4`,`f3`,`f5`,(cast(`j` as signed array)))
) ENGINE=InnoDB;

The following query should be able to use all parts of index ix5, but the multi-valued index is not used in the range scan:

mysql> EXPLAIN ANALYZE SELECT id FROM mvi WHERE (f1 = 1 AND f3 = 1 AND f4 = 1 AND f5 IN (963474745,963474787,963474788,963474789) AND 2 MEMBER OF(j))\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((mvi.f4 = 1) and (mvi.f3 = 1) and (mvi.f1 = 1) and (mvi.f5 in (963474745,963474787,963474788,963474789)) and json'2' member of (cast(j as signed array)))  (cost=540 rows=1198) (actual time=0.0589..5.24 rows=255 loops=1)
    -> Index range scan on mvi using ix5 over (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474745) OR (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474787) OR (2 more)  (cost=540 rows=1198) (actual time=0.0562..5.08 rows=524 loops=1)

When the optimizer is prevented from looking at other indexes the problem goes away:

mysql> EXPLAIN ANALYZE SELECT id FROM mvi FORCE INDEX (ix5) WHERE (f1 = 1 AND f3 = 1 AND f4 = 1 AND f5 IN (963474745,963474787,963474788,963474789) AND 2
 MEMBER OF(j))\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((mvi.f4 = 1) and (mvi.f3 = 1) and (mvi.f1 = 1) and (mvi.f5 in (963474745,963474787,963474788,963474789)) and json'2' member of (cast(j as signed array)))  (cost=2.81 rows=4) (actual time=0.0877..1.58 rows=255 loops=1)
    -> Index range scan on mvi using ix5 over (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474745 AND 2 MEMBER OF (j)) OR (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474787 AND 2 MEMBER OF (j)) OR (2 more)  (cost=2.81 rows=4) (actual time=0.0844..1.48 rows=255 loops=1)
1 row in set (0.00 sec)

Rewriting the index also fixes the issue. The query no longer even needs index hints to work as expected:

mysql> ALTER TABLE mvi RENAME INDEX ix5 TO ix6;
Query OK, 128382 rows affected (2.58 sec)
Records: 128382  Duplicates: 0  Warnings: 0

mysql> EXPLAIN ANALYZE SELECT id FROM mvi WHERE (f1 = 1 AND f3 = 1 AND f4 = 1 AND f5 IN (963474745,963474787,963474788,963474789) AND 2 MEMBER OF(j))\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((mvi.f4 = 1) and (mvi.f3 = 1) and (mvi.f1 = 1) and (mvi.f5 in (963474745,963474787,963474788,963474789)) and json'2' member of (cast(j as signed array)))  (cost=116 rows=255) (actual time=0.066..1.06 rows=255 loops=1)
    -> Index range scan on mvi using ix6 over (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474745 AND 2 MEMBER OF (j)) OR (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474787 AND 2 MEMBER OF (j)) OR (2 more)  (cost=116 rows=255) (actual time=0.0625..0.968 rows=255 loops=1)

But the problem can come back when another index is also rewritten:

mysql> ALTER TABLE mvi RENAME INDEX ix1 TO ix2;
Query OK, 128382 rows affected (2.47 sec)
Records: 128382  Duplicates: 0  Warnings: 0

mysql> EXPLAIN ANALYZE SELECT id FROM mvi WHERE (f1 = 1 AND f3 = 1 AND f4 = 1 AND f5 IN (963474745,963474787,963474788,963474789) AND 2 MEMBER OF(j))\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((mvi.f4 = 1) and (mvi.f3 = 1) and (mvi.f1 = 1) and (mvi.f5 in (963474745,963474787,963474788,963474789)) and json'2' member of (cast(j as signed array)))  (cost=540 rows=1198) (actual time=0.0829..3.88 rows=255 loops=1)
    -> Index range scan on mvi using ix6 over (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474745) OR (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474787) OR (2 more)  (cost=540 rows=1198) (actual time=0.0702..3.72 rows=524 loops=1)
1 row in set (0.01 sec)

How to repeat:
1. Create table and load records (e.g. from mvi.sql)

2. Run: EXPLAIN ANALYZE SELECT id FROM mvi WHERE (f1 = 1 AND f3 = 1 AND f4 = 1 AND f5 IN (963474745,963474787,963474788,963474789) AND 2 MEMBER OF(j))

Suggested fix:
n/a
[10 Oct 2024 16:56] Maciej Dobrzanski
mvi.sql

Attachment: mvi.sql (application/octet-stream, text), 4.43 MiB.

[11 Oct 2024 8:03] MySQL Verification Team
Hello Maciej,

Thank you for the report and test case.

regards,
Umesh