Bug #110015 Multi-valued index on JSON array is not used when used in a join
Submitted: 10 Feb 2023 3:06 Modified: 13 Feb 2023 14:27
Reporter: Jonathan Balinski Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0.32 OS:Windows (11)
Assigned to: CPU Architecture:x86

[10 Feb 2023 3:06] Jonathan Balinski
Description:
Indexes based on JSON data are not utilized when used joined by a column in another table of the same data type.  Instead of using the index, a full table scan is used instead.

There doesn't appear to be any documented limitation with using these types of indexes as part of a join: https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-multi-valued.

Without the ability to use these types of indexes as part of a join, the utility of these indexes is very limited.

How to repeat:
==== DATA PREP =====
CREATE TABLE `event` (
  `eventid` int unsigned NOT NULL AUTO_INCREMENT,
  `content` json DEFAULT NULL,
  PRIMARY KEY (`eventid`),
  KEY `event_topic` ((cast(json_extract(`content`,_utf8mb4'$.TopicIds[*]') as unsigned array)))
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;

CREATE TABLE `topic` (
  `topicid` int unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `important` bit(1) DEFAULT b'0',
  PRIMARY KEY (`topicid`),
  KEY `important_ix` (`important`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4;

DELIMITER $$
CREATE PROCEDURE prepare_data()
BEGIN
  DECLARE i INT DEFAULT 100;

  WHILE i < 1000 DO
    INSERT INTO `topic` (`name`,`important`) VALUES (concat('Topic', i), (i%10=0));
    SET i = i + 1;
  END WHILE;
  
  SET i = 0;
  WHILE i < 5000 DO
	INSERT INTO `event` (`content`) VALUES (concat('{"TopicIds": [', FLOOR(RAND()*1000), ',', FLOOR(RAND()*1000), ']}'));
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;

CALL prepare_data();

==== END DATA PREP =====

With the following data in place, you can run explain for the following query and see that the multi-valued index on the json data in the event table is not being used:
SELECT `event`.`eventid` FROM `event`, `topic` 
WHERE `topic`.`topicid` MEMBER OF (`event`.`content`->'$.TopicIds[*]') 
AND `topic`.`important` = 1

With a simple query to find events with specific values and no joins, the index is correctly used:

SELECT `event`.`eventid` FROM `event`
WHERE 400 MEMBER OF (`event`.`content`->'$.TopicIds[*]') 

Suggested fix:
Utilize these indexes when appropriate.
[10 Feb 2023 14:00] MySQL Verification Team
Hi Mr. Balinski,

Thank you for your bug report.

However, if you take a look at the explain output:

 explain SELECT `event`.`eventid` FROM `event`, `topic` WHERE `topic`.`topicid` MEMBER OF (`event`.`content`->'$.TopicIds[*]') AND `topic`.`important` = 1;
+----+-------------+-------+------------+------+---------------+--------------+---------+-------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key          | key_len | ref   | rows | filtered | Extra                                      |
+----+-------------+-------+------------+------+---------------+--------------+---------+-------+------+----------+--------------------------------------------+
|  1 | SIMPLE      | topic | NULL       | ref  | important_ix  | important_ix | 2       | const |   90 |   100.00 | Using index                                |
|  1 | SIMPLE      | event | NULL       | ALL  | NULL          | NULL         | NULL    | NULL  | 5000 |   100.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+--------------+---------+-------+------+----------+--------------------------------------------+

you will notice that the plan that was adopted is far more efficient then if the JSON multi-valued index were used.

That is very easy to conclude.

We also think that you can not join two tables with MEMBER OF operator, but we shall have to enquire about that.

Not a bug.
[10 Feb 2023 15:01] Jonathan Balinski
How is that the most efficient query plan?  It is doing a full table scan on the event table.  The test case I submitted only inserts a small amount of data in the event table but our event table equivalent in real life has millions of rows.  Running a full table scan when an index lookup is available is not good.
[10 Feb 2023 15:09] MySQL Verification Team
Hi,

There are simply no alternatives.

First of all, JOIN is performed by the hashing algorithm, which is the only alternative left.

Second, MEMBER OF, as stipulated in your query, can not be used as an index.

We are considering documenting the previous statement in our Reference Manual, but please wait until next week regarding that, since we are waiting on some clarifications on the matter.
[10 Feb 2023 15:53] Jonathan Balinski
Just to clarify, MEMBER OF does utilize the index for simple cases like the one I provided:

SELECT `event`.`eventid` FROM `event`
WHERE 400 MEMBER OF (`event`.`content`->'$.TopicIds[*]')
[13 Feb 2023 13:22] MySQL Verification Team
Hi,

Yes, we are quite aware of that .......
[13 Feb 2023 14:27] Knut Anders Hatlen
Thanks for the report. This looks like a valid feature request. (I think the limitation currently applies to all functional indexes and indexes on generated columns, not only multi-valued indexes.)
[14 Feb 2023 9:23] Knut Anders Hatlen
I've filed a documentation bug to clarify the current limitations (bug#110059).
[14 Feb 2023 11:59] MySQL Verification Team
Thank you, Knut ......
[20 Nov 2023 11:11] C Cl
I am running in the same issue as the original post 

Uses the index ✅ : 

```
SELECT 
  content.id 
FROM 
  content 
WHERE 
  123 MEMBER OF (content.media_ids);`
```

Doesn't use the index ❌ :

```
SELECT 
  media.id, 
  (SELECT count(*)
     FROM content
     WHERE media.id MEMBER OF ( content.media_ids)
  ) as 'count'
FROM 
  media
WHERE
  media.id = 123;
```

I have also tried with join, lateral join without any luck. From what i understand of the above conversation this is not supported and I think this would be a great addition to the multi-valued index features.
[20 Nov 2023 11:24] MySQL Verification Team
Hi,

These are two different types of queries and those can not be compared.

Your second query can not use indices since it is a dependent nested query.

You can rewrite your query into semi-join which would use indices.

This is a forum for bug reports and we do not provide here advices on how to improve performance.