Bug #116335 Multi-valued index isn't used with some range scans on composite indexes
Submitted: 10 Oct 2024 16:53 Modified: 11 Oct 2024 8:03
Reporter: Maciej Dobrzanski Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0.39 OS:Any
Assigned to: CPU Architecture:Any
Tags: Multi-Valued Indexes, Optimizer

[10 Oct 2024 16:53] Maciej Dobrzanski
Description:
In complex use cases where a multi-valued index is present in a composite key, the optimizer can sometimes ignore the multi-valued index for range scans. It will only want to use the prefix of the key up to but excluding the multi-valued index itself. This only seems to happen when there are multiple such keys in the table. The physical order of those indexes also seems to matter.

Here's an example of such table:

CREATE TABLE `mvi` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `f1` int NOT NULL,
  `f2` int NOT NULL,
  `f3` int DEFAULT NULL,
  `f4` int NOT NULL,
  `f5` int DEFAULT NULL,
  `j` json NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ix1` (`f1`,`f3`,(cast(`j` as signed array))),
  KEY `ix3` (`f1`,`f4`,`f3`,(cast(`j` as signed array))),
  KEY `ix5` (`f1`,`f4`,`f3`,`f5`,(cast(`j` as signed array)))
) ENGINE=InnoDB;

The following query should be able to use all parts of index ix5, but the multi-valued index is not used in the range scan:

mysql> EXPLAIN ANALYZE SELECT id FROM mvi WHERE (f1 = 1 AND f3 = 1 AND f4 = 1 AND f5 IN (963474745,963474787,963474788,963474789) AND 2 MEMBER OF(j))\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((mvi.f4 = 1) and (mvi.f3 = 1) and (mvi.f1 = 1) and (mvi.f5 in (963474745,963474787,963474788,963474789)) and json'2' member of (cast(j as signed array)))  (cost=540 rows=1198) (actual time=0.0589..5.24 rows=255 loops=1)
    -> Index range scan on mvi using ix5 over (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474745) OR (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474787) OR (2 more)  (cost=540 rows=1198) (actual time=0.0562..5.08 rows=524 loops=1)

When the optimizer is prevented from looking at other indexes the problem goes away:

mysql> EXPLAIN ANALYZE SELECT id FROM mvi FORCE INDEX (ix5) WHERE (f1 = 1 AND f3 = 1 AND f4 = 1 AND f5 IN (963474745,963474787,963474788,963474789) AND 2
 MEMBER OF(j))\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((mvi.f4 = 1) and (mvi.f3 = 1) and (mvi.f1 = 1) and (mvi.f5 in (963474745,963474787,963474788,963474789)) and json'2' member of (cast(j as signed array)))  (cost=2.81 rows=4) (actual time=0.0877..1.58 rows=255 loops=1)
    -> Index range scan on mvi using ix5 over (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474745 AND 2 MEMBER OF (j)) OR (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474787 AND 2 MEMBER OF (j)) OR (2 more)  (cost=2.81 rows=4) (actual time=0.0844..1.48 rows=255 loops=1)
1 row in set (0.00 sec)

Rewriting the index also fixes the issue. The query no longer even needs index hints to work as expected:

mysql> ALTER TABLE mvi RENAME INDEX ix5 TO ix6;
Query OK, 128382 rows affected (2.58 sec)
Records: 128382  Duplicates: 0  Warnings: 0

mysql> EXPLAIN ANALYZE SELECT id FROM mvi WHERE (f1 = 1 AND f3 = 1 AND f4 = 1 AND f5 IN (963474745,963474787,963474788,963474789) AND 2 MEMBER OF(j))\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((mvi.f4 = 1) and (mvi.f3 = 1) and (mvi.f1 = 1) and (mvi.f5 in (963474745,963474787,963474788,963474789)) and json'2' member of (cast(j as signed array)))  (cost=116 rows=255) (actual time=0.066..1.06 rows=255 loops=1)
    -> Index range scan on mvi using ix6 over (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474745 AND 2 MEMBER OF (j)) OR (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474787 AND 2 MEMBER OF (j)) OR (2 more)  (cost=116 rows=255) (actual time=0.0625..0.968 rows=255 loops=1)

But the problem can come back when another index is also rewritten:

mysql> ALTER TABLE mvi RENAME INDEX ix1 TO ix2;
Query OK, 128382 rows affected (2.47 sec)
Records: 128382  Duplicates: 0  Warnings: 0

mysql> EXPLAIN ANALYZE SELECT id FROM mvi WHERE (f1 = 1 AND f3 = 1 AND f4 = 1 AND f5 IN (963474745,963474787,963474788,963474789) AND 2 MEMBER OF(j))\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((mvi.f4 = 1) and (mvi.f3 = 1) and (mvi.f1 = 1) and (mvi.f5 in (963474745,963474787,963474788,963474789)) and json'2' member of (cast(j as signed array)))  (cost=540 rows=1198) (actual time=0.0829..3.88 rows=255 loops=1)
    -> Index range scan on mvi using ix6 over (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474745) OR (f1 = 1 AND f4 = 1 AND f3 = 1 AND f5 = 963474787) OR (2 more)  (cost=540 rows=1198) (actual time=0.0702..3.72 rows=524 loops=1)
1 row in set (0.01 sec)

How to repeat:
1. Create table and load records (e.g. from mvi.sql)

2. Run: EXPLAIN ANALYZE SELECT id FROM mvi WHERE (f1 = 1 AND f3 = 1 AND f4 = 1 AND f5 IN (963474745,963474787,963474788,963474789) AND 2 MEMBER OF(j))

Suggested fix:
n/a
[10 Oct 2024 16:56] Maciej Dobrzanski
mvi.sql

Attachment: mvi.sql (application/octet-stream, text), 4.43 MiB.

[11 Oct 2024 8:03] MySQL Verification Team
Hello Maciej,

Thank you for the report and test case.

regards,
Umesh
[10 Sep 1:54] Yoseph Phillips
I believe we are seeing the same issue which is also related to issues 87613 and 71334. The following shows that IN performs well and only examines about 100 rows, however INNER JOIN is examining more than 16,000 rows and performing terribly.

DROP PROCEDURE IF EXISTS createTestData;

DELIMITER $$

CREATE PROCEDURE createTestData() 
BEGIN
  DECLARE counter INT DEFAULT 0;
  
  SET time_zone = '+00:00';
  
  DROP TABLE IF EXISTS test_data;
  
  CREATE TABLE test_data (
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
    column1 TINYINT UNSIGNED NOT NULL,
    column2 MEDIUMINT UNSIGNED NOT NULL,
    column3 TIMESTAMP NOT NULL,
    column4 VARCHAR(1000),
    PRIMARY KEY (id),
    INDEX column1_column2_index (column1, column2),
    INDEX column1_column3_index (column1, column3)
  );
  
  INSERT INTO test_data (column1, column2, column3, column4)
  VALUES (
    floor(rand() * 256),
    floor(rand() * 16777216), 
    '2025-09-10 00:00:00' - INTERVAL floor(rand() * 16777216) MINUTE, 
    'The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.'
  );
  
  WHILE counter < 20 DO
	INSERT INTO test_data (column1, column2, column3, column4)
    SELECT 
      floor(rand() * 256),
      floor(rand() * 16777216), 
      '2025-09-10 00:00:00' - INTERVAL floor(rand() * 16777216) MINUTE, 
      t.column4
    FROM test_data t;
    
    SET counter = counter + 1;
  END WHILE;
END$$

DELIMITER ;

CALL createTestData();

DROP PROCEDURE IF EXISTS createTestData;

DROP TEMPORARY TABLE IF EXISTS test_data2;
  
CREATE TEMPORARY TABLE test_data2 (id TINYINT UNSIGNED NOT NULL, PRIMARY KEY (id));

INSERT INTO test_data2 (id) VALUES (50), (100), (150), (200);

EXPLAIN ANALYZE SELECT count(*) FROM test_data t WHERE t.column1 IN (50, 100, 150, 200) AND t.column2 > (16777216 - 100000);

EXPLAIN ANALYZE SELECT count(*) FROM test_data t WHERE t.column1 IN (50, 100, 150, 200) AND t.column3 > ('2025-09-10 00:00:00' - INTERVAL 100000 MINUTE);

EXPLAIN ANALYZE SELECT count(*) FROM test_data t INNER JOIN test_data2 t2 ON t2.id = t.column1 WHERE t.column2 > (16777216 - 100000);

EXPLAIN ANALYZE SELECT count(*) FROM test_data t INNER JOIN test_data2 t2 ON t2.id = t.column1 WHERE t.column3 > ('2025-09-10 00:00:00' - INTERVAL 100000 MINUTE);

DROP TEMPORARY TABLE IF EXISTS test_data2;