Bug #106580 Multi-Valued Indexes only takes into account a portion of a whole data.
Submitted: 26 Feb 2022 9:41 Modified: 2 Mar 2022 14:50
Reporter: Quy Le Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: Multi-Valued Indexes

[26 Feb 2022 9:41] Quy Le
Description:
I have a table with one of its column that is of JSON array format, the column has multi-valued index on top of it... but for some reasons, the result returned is very different as compared to non-index result.
My column:
+-------------------+
| character_id_list |
+-------------------+
| [1, 5, 7] |
| [15, 10, 14] |
| [15, 1, 11] |
| [11, 8, 4] |
| [4, 2, 8] |
| [15, 13, 6] |
| [14, 6, 9] |
| [15, 6, 4] |
| [4, 2, 11] |
| [9, 11, 12] |
+-------------------+
Query plan with multi-valued indexes:
+----+-------------+--------------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
| 1 | SIMPLE | game_players | NULL | range | idx_char | idx_char | 9 | NULL | 878 | 100.00 | Using where |
+----+-------------+--------------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
=> it only takes 878 rows.

Query plan without multi-valued indexes:
+----+-------------+--------------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | game_players | NULL | ALL | NULL | NULL | NULL | NULL | 49680 | 100.00 | Using where |
+----+-------------+--------------+------------+------+---------------+------+---------+------+-------+----------+-------------+
=> this one takes up to 49680 rows

P/S: for small table with a few thousands records, it works fine, but if the table has big records(>5k), data returned is not accurate any more.

How to repeat:
CREATE TABLE `game_players` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `character_id_list` json NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_player_sn_game_code` (`player_sn`,`game_code`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- // Result (with multi-valued index):
select count(*) from game_players where JSON_CONTAINS(character_id_list,'1','$');
+----------+
| count(*) |
+----------+
| 878 |
+----------+

-- // Result (without multi-valued index):
+----------+
| count(*) |
+----------+
| 10106 |
+----------+
[26 Feb 2022 9:48] Quy Le
Edit: 
create index idx_char on game_players ((CAST(character_id_list AS unsigned array)))
[26 Feb 2022 9:50] Quy Le
MySql version: 8.0.28.
[28 Feb 2022 14:18] MySQL Verification Team
Hi Mr. Le,

Thank you for your bug report.

However, we need to be able to repeat the problem that you experience. Hence, what we need are all rows from that table as well.

Waiting on your feedback.
[2 Mar 2022 1:36] Quy Le
Here is all rows of the table.

Attachment: game_players.sql (application/octet-stream, text), 1.11 MiB.

[2 Mar 2022 14:50] MySQL Verification Team
Hi Mr. Le,

We were able to repeat the same behaviour.

Our results are as following:

count(*)
10170
count(*)
878

First result is without index and second one is with index.

Thank you for your contribution.

This bug is now verified.