| Bug #104897 | Multi-valued index performance is too slow (Access too many rows than required) | ||
|---|---|---|---|
| Submitted: | 10 Sep 2021 6:29 | Modified: | 6 Aug 2024 13:18 |
| Reporter: | Seunguck Lee | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
| Version: | 8.0.26, 8.0.17 | OS: | Any |
| Assigned to: | CPU Architecture: | x86 | |
| Tags: | multi-valued index | ||
[10 Sep 2021 6:39]
Seunguck Lee
This is part of full test data. I have uploaded full test data via SFTP
Attachment: test_mvi-partial.sql.tar.gz (application/x-gzip, text), 264.30 KiB.
[10 Sep 2021 9:52]
MySQL Verification Team
Hello Seunguck Lee, Thank you for the report and test case. regards, Umesh
[28 Sep 2022 9:37]
Gyeongnam Kim
I could reproduce this issue in 8.0.28, and checked there is issue with index ordering. I added the index with changed ordering and just it makes count different.
I loaded in the SQL file and double it some times.
mysql> insert into test_mvi(type, p_at,hids) SELECT type, p_at,hids FROM test_mvi;
Query OK, 8380 rows affected (0.32 sec)
Records: 8380 Duplicates: 0 Warnings: 0
mysql> insert into test_mvi(type, p_at,hids) SELECT type, p_at,hids FROM test_mvi;
Query OK, 16760 rows affected (0.42 sec)
Records: 16760 Duplicates: 0 Warnings: 0
mysql> insert into test_mvi(type, p_at,hids) SELECT type, p_at,hids FROM test_mvi;
Query OK, 33520 rows affected (0.96 sec)
Records: 33520 Duplicates: 0 Warnings: 0
mysql> insert into test_mvi(type, p_at,hids) SELECT type, p_at,hids FROM test_mvi;
Query OK, 67040 rows affected (1.95 sec)
Records: 67040 Duplicates: 0 Warnings: 0
mysql> insert into test_mvi(type, p_at,hids) SELECT type, p_at,hids FROM test_mvi;
Query OK, 134080 rows affected (5.67 sec)
Records: 134080 Duplicates: 0 Warnings: 0
I added index with different ordering with exist index..
mysql> ALTER TABLE test_mvi DROP KEY ix_type_hid_pat_2, ADD KEY ix_type_hid_pat_2(`type`, `p_at`, (cast(json_extract(`hids`,_utf8mb4'$') as unsigned array)));
Query OK, 268160 rows affected (10.94 sec)
Records: 268160 Duplicates: 0 Warnings: 0
mysql> SHOW CREATE TABLE test_mvi\G
*************************** 1. row ***************************
Table: test_mvi
Create Table: CREATE TABLE `test_mvi` (
`id` bigint NOT NULL AUTO_INCREMENT,
`type` enum('T1','T2','T3') NOT NULL,
`p_at` datetime(6) NOT NULL,
`hids` json DEFAULT (json_array()),
PRIMARY KEY (`id`),
KEY `ix_type_hid_pat` (`type`,(cast(json_extract(`hids`,_utf8mb4'$') as unsigned array)),`p_at`),
KEY `ix_type_hid_pat_2` (`type`,`p_at`,(cast(json_extract(`hids`,_utf8mb4'$') as unsigned array)))
) ENGINE=InnoDB AUTO_INCREMENT=450741 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.08 sec)
And I ran same query but using other indexes, and checked there are differences. Although it's same query without hint, counts are different.
-- with the new index order, it returns more counts.
mysql> SELECT COUNT(*) FROM test_mvi USE INDEX (ix_type_hid_pat_2) WHERE type IN ('T1') AND JSON_OVERLAPS(hids->'$', CAST('[613349414265683967, 617853013892792319, 613349377578106879, 617852977203904511]' AS JSON)) ;
+----------+
| COUNT(*) |
+----------+
| 896 |
+----------+
1 row in set (4.49 sec)
-- with other index - it returns row normally.
mysql> SELECT COUNT(*) FROM test_mvi USE INDEX (ix_type_hid_pat) WHERE type IN ('T1') AND JSON_OVERLAPS(hids->'$', CAST('[613349414265683967, 617853013892792319, 613349377578106879, 617852977203904511]' AS JSON)) ;
+----------+
| COUNT(*) |
+----------+
| 224 |
+----------+
1 row in set (0.06 sec)
[28 Sep 2022 12:39]
MySQL Verification Team
Hi Mr. Kim, Thank you for your contribution. It will be very useful when it comes to improving the performance of queries that use many tuples in an index. However, there is no RDBMS that will automatically create the best possible index for its future usage. That is why this job is left to DBA's, schema normalisers, query tuners etc. Simply a RDBMS can not predict what will the index be used for. It is , however, possible to make the analysis like yours. Thank you for your contribution.
[6 Aug 2024 13:18]
Jon Stephens
Documented fix as follows in the MySQL 8.0.40, 8.4.3, and 9.1.0 changelogs:
A query using a greater-than (>) or less-than (<)
comparison with a multi-valued index executed much more slowly
than the same query using an equality (=) comparison with the
same index.
Closed.
[6 Aug 2024 13:20]
MySQL Verification Team
Thank you Jon.

Description: The query which has "greater or less than (> or <)" condition on multi-valued index is too slow. If there's only EQUAL condition, much faster on multi-valued index. Following result shows the performance difference. If there's many rows in the table (regardless of output rows), performance difference is much bigger than following case. (For example, there's 30M rows on table, slower query takes about 5 minutes) But according to the output of "EXPLAIN ANALYZE", there's almost no difference between slower and faster case. Output shows only difference on "time" and message about "p_at>=?" condition. -- // Slower case SELECT COUNT(*) FROM test_mvi WHERE type IN ('T1', 'T2') AND JSON_OVERLAPS(hids->'$', CAST('[613349414265683967, 617853013892792319, 613349377578106879, 617852977203904511]' AS JSON)) AND p_at>='2021-08-10 13:05:46.567187'; +----------+ | COUNT(*) | +----------+ | 81 | +----------+ 1 row in set (2.01 sec) | EXPLAIN | -> Aggregate: count(0) (cost=54.26 rows=95) (actual time=1950.447..1950.447 rows=1 loops=1) -> Filter: ((test_mvi.`type` in ('T1','T2')) and json_overlaps(cast(json_extract(hids,_utf8mb4'$') as unsigned array),json'[613349414265683967, 617853013892792319, 613349377578106879, 617852977203904511]') and (test_mvi.p_at >= TIMESTAMP'2021-08-10 13:05:46.567187')) (cost=44.76 rows=95) (actual time=0.102..1950.424 rows=81 loops=1) -> Index range scan on test_mvi using ix_type_hid_pat (cost=44.76 rows=95) (actual time=0.087..1685.407 rows=99808 loops=1) 1 row in set (1.95 sec) -- // Faster case SELECT COUNT(*) FROM test_mvi WHERE type IN ('T1', 'T2') AND JSON_OVERLAPS(hids->'$', CAST('[613349414265683967, 617853013892792319, 613349377578106879, 617852977203904511]' AS JSON)); +----------+ | COUNT(*) | +----------+ | 81 | +----------+ 1 row in set (0.00 sec) | EXPLAIN | -> Aggregate: count(0) (cost=54.26 rows=95) (actual time=1.482..1.482 rows=1 loops=1) -> Filter: ((test_mvi.`type` in ('T1','T2')) and json_overlaps(cast(json_extract(hids,_utf8mb4'$') as unsigned array),json'[613349414265683967, 617853013892792319, 613349377578106879, 617852977203904511]')) (cost=44.76 rows=95) (actual time=0.099..1.460 rows=81 loops=1) -> Index range scan on test_mvi using ix_type_hid_pat (cost=44.76 rows=95) (actual time=0.085..0.966 rows=81 loops=1) How to repeat: -- // Create test table CREATE TABLE test_mvi ( id bigint NOT NULL AUTO_INCREMENT, type enum('T1','T2','T3') NOT NULL, p_at datetime(6) NOT NULL, hids json DEFAULT (json_array()), PRIMARY KEY (id), KEY ix_type_hid_pat (type,(cast(json_extract(hids,_utf8mb4'$') as unsigned array)),p_at) ); -- // Load 10K sample data (I will attach later) SELECT COUNT(*) FROM test_mvi; +----------+ | COUNT(*) | +----------+ | 100000 | +----------+ -- // Compare query performance SELECT COUNT(*) FROM test_mvi WHERE type IN ('T1', 'T2') AND JSON_OVERLAPS(hids->'$', CAST('[613349414265683967, 617853013892792319, 613349377578106879, 617852977203904511]' AS JSON)) AND p_at>='2021-08-10 13:05:46.567187'; +----------+ | COUNT(*) | +----------+ | 81 | +----------+ 1 row in set (2.01 sec) SELECT COUNT(*) FROM test_mvi WHERE type IN ('T1', 'T2') AND JSON_OVERLAPS(hids->'$', CAST('[613349414265683967, 617853013892792319, 613349377578106879, 617852977203904511]' AS JSON)); +----------+ | COUNT(*) | +----------+ | 81 | +----------+ 1 row in set (0.00 sec) Suggested fix: Looks server global status shows some clue why the "p_at>=?" condition query is so slow. -------------------------------------------------------- with (p_at>=?) without (p_at>=?) -------------------------------------------------------- Innodb_rows_read 422781 92 Handler_commit 1 1 Handler_external_lock 2 2 Handler_read_key 8 8 Handler_read_next 422773 92 Handler_write 422773 92 -------------------------------------------------------- Total key entry of multi-valued index is 399997. Looks test_mvi rows are read for the number of multi-valued index key entry when "p_at>=?" condition is provided. SELECT SUM(JSON_LENGTH(hids)) FROM test_mvi; +------------------------+ | SUM(JSON_LENGTH(hids)) | +------------------------+ | 399997 | +------------------------+ I think MySQL server can process this query by reading only required rows from table.