Bug #110015 | Multi-valued index on JSON array is not used when used in a join | ||
---|---|---|---|
Submitted: | 10 Feb 2023 3:06 | Modified: | 13 Feb 2023 14:27 |
Reporter: | Jonathan Balinski | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 8.0.32 | OS: | Windows (11) |
Assigned to: | CPU Architecture: | x86 |
[10 Feb 2023 3:06]
Jonathan Balinski
[10 Feb 2023 14:00]
MySQL Verification Team
Hi Mr. Balinski, Thank you for your bug report. However, if you take a look at the explain output: explain SELECT `event`.`eventid` FROM `event`, `topic` WHERE `topic`.`topicid` MEMBER OF (`event`.`content`->'$.TopicIds[*]') AND `topic`.`important` = 1; +----+-------------+-------+------------+------+---------------+--------------+---------+-------+------+----------+--------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+--------------+---------+-------+------+----------+--------------------------------------------+ | 1 | SIMPLE | topic | NULL | ref | important_ix | important_ix | 2 | const | 90 | 100.00 | Using index | | 1 | SIMPLE | event | NULL | ALL | NULL | NULL | NULL | NULL | 5000 | 100.00 | Using where; Using join buffer (hash join) | +----+-------------+-------+------------+------+---------------+--------------+---------+-------+------+----------+--------------------------------------------+ you will notice that the plan that was adopted is far more efficient then if the JSON multi-valued index were used. That is very easy to conclude. We also think that you can not join two tables with MEMBER OF operator, but we shall have to enquire about that. Not a bug.
[10 Feb 2023 15:01]
Jonathan Balinski
How is that the most efficient query plan? It is doing a full table scan on the event table. The test case I submitted only inserts a small amount of data in the event table but our event table equivalent in real life has millions of rows. Running a full table scan when an index lookup is available is not good.
[10 Feb 2023 15:09]
MySQL Verification Team
Hi, There are simply no alternatives. First of all, JOIN is performed by the hashing algorithm, which is the only alternative left. Second, MEMBER OF, as stipulated in your query, can not be used as an index. We are considering documenting the previous statement in our Reference Manual, but please wait until next week regarding that, since we are waiting on some clarifications on the matter.
[10 Feb 2023 15:53]
Jonathan Balinski
Just to clarify, MEMBER OF does utilize the index for simple cases like the one I provided: SELECT `event`.`eventid` FROM `event` WHERE 400 MEMBER OF (`event`.`content`->'$.TopicIds[*]')
[13 Feb 2023 13:22]
MySQL Verification Team
Hi, Yes, we are quite aware of that .......
[13 Feb 2023 14:27]
Knut Anders Hatlen
Thanks for the report. This looks like a valid feature request. (I think the limitation currently applies to all functional indexes and indexes on generated columns, not only multi-valued indexes.)
[14 Feb 2023 9:23]
Knut Anders Hatlen
I've filed a documentation bug to clarify the current limitations (bug#110059).
[14 Feb 2023 11:59]
MySQL Verification Team
Thank you, Knut ......
[20 Nov 2023 11:11]
C Cl
I am running in the same issue as the original post Uses the index ✅ : ``` SELECT content.id FROM content WHERE 123 MEMBER OF (content.media_ids);` ``` Doesn't use the index ❌ : ``` SELECT media.id, (SELECT count(*) FROM content WHERE media.id MEMBER OF ( content.media_ids) ) as 'count' FROM media WHERE media.id = 123; ``` I have also tried with join, lateral join without any luck. From what i understand of the above conversation this is not supported and I think this would be a great addition to the multi-valued index features.
[20 Nov 2023 11:24]
MySQL Verification Team
Hi, These are two different types of queries and those can not be compared. Your second query can not use indices since it is a dependent nested query. You can rewrite your query into semi-join which would use indices. This is a forum for bug reports and we do not provide here advices on how to improve performance.