Description:
When a query uses MEMBER OF on a JSON array column with a multi-valued functional index, MySQL normally rewrites the JSON expression to the hidden generated column behind the multi-valued index.
However, if an optimizer index hint such as /*+ INDEX(t PRIMARY) */ is used and the hinted index is not the multi-valued index, the generated-column substitution is skipped.
This causes the predicate to remain in the original json_extract(...) form:
'1' MEMBER OF (j->'$[*]')
instead of being rewritten to the multi-valued index generated column form:
json'"1"' MEMBER OF (CAST(JSON_EXTRACT(j, '$[*]') AS CHAR(10) ARRAY))
As a result:
MEMBER OF loses the multi-valued-index-usable expression form.
The filter falls back to per-row json_extract(...) evaluation.
The optimizer can no longer recognize the predicate as matching the multi-valued index.
The issue appears to be caused by substitute_gc_expression() checking field->table->keys_in_use_for_query. Index hints narrow keys_in_use_for_query to the hinted index, so the multi-valued index bit is absent. This makes the generated column substitution skip the multi-valued index field, even though the substitution is still useful for MEMBER OF evaluation semantics and not only for access path selection.
How to repeat:
mysql> select version();
+-----------+
| version() |
+-----------+
| 9.6.0 |
+-----------+
1 row in set (0.00 sec)
mysql> create table t2(id int primary key, j json, key idx ((cast(j->'$[*]' as char(10) array))));
Query OK, 0 rows affected (0.04 sec)
mysql> explain format=tree select * from t2 where '1' member of (j->'$[*]');
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: json'"1"' member of (cast(json_extract(j,_utf8mb4'$[*]') as char(10) array)) (cost=0.35 rows=1)
-> Index lookup on t2 using idx (cast(json_extract(j,_utf8mb4'$[*]') as char(10) array) = json'"1"') (cost=0.35 rows=1)
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain format=tree select /*+ NO_INDEX(t2) */ * from t2 where '1' member of (j->'$[*]');
+--------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: <cache>('1') member of (json_extract(t2.j,'$[*]')) (cost=0.35 rows=1)
-> Table scan on t2 (cost=0.35 rows=1)
|
+--------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain format=tree select /*+ INDEX(`t2` `PRIMARY`) */ * from t2 where '1' member of (j->'$[*]');
+------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: <cache>('1') member of (json_extract(t2.j,'$[*]')) (cost=0.6 rows=1)
-> Table scan on t2 (cost=0.6 rows=1)
|
+------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain format=tree select /*+ INDEX(`t2` `idx`) */ * from t2 where '1' member of (j->'$[*]');
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: json'"1"' member of (cast(json_extract(j,_utf8mb4'$[*]') as char(10) array)) (cost=0.35 rows=1)
-> Index lookup on t2 using idx (cast(json_extract(j,_utf8mb4'$[*]') as char(10) array) = json'"1"') (cost=0.35 rows=1)
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Expected result:
Even if INDEX(t2 PRIMARY) prevents the optimizer from choosing the multi-valued index as the access path, the MEMBER OF predicate should still be rewritten to the generated-column typed-array form when the predicate matches a multi-valued index expression.
The index hint should affect access path selection, but should not force the predicate back to the expensive per-row json_extract(...) evaluation form.
json_contains and json_overlaps have the same behavior.
Suggested fix:
diff --git a/sql/item_func.cc b/sql/item_func.cc
index bf9823a24ed..824e4e743e9 100644
--- a/sql/item_func.cc
+++ b/sql/item_func.cc
@@ -1195,7 +1195,7 @@ static bool substitute_gc_expression(Item **expr, Item **value,
// Check whether the field has usable keys.
Key_map tkm = field->part_of_key;
tkm.merge(field->part_of_prefixkey); // Include prefix keys.
- tkm.intersect(field->table->keys_in_use_for_query);
+ if (!field->is_array()) tkm.intersect(field->table->keys_in_use_for_query);
/*
Don't substitute if:
1) Key is disabled
@@ -1287,7 +1287,7 @@ static void gc_subst_overlaps_contains(Item **func, Item **vals,
bool can_use_index = true;
// Check whether field has usable keys
Key_map tkm = fld.part_of_key;
- tkm.intersect(fld.table->keys_in_use_for_query);
+ if (!fld.is_array()) tkm.intersect(fld.table->keys_in_use_for_query);
if (tkm.is_clear_all() || !fld.is_array()) continue;
Functional_index_error_handler func_idx_err_hndl(&fld, thd);
The patch will also solve https://bugs.mysql.com/bug.php?id=112147
Description: When a query uses MEMBER OF on a JSON array column with a multi-valued functional index, MySQL normally rewrites the JSON expression to the hidden generated column behind the multi-valued index. However, if an optimizer index hint such as /*+ INDEX(t PRIMARY) */ is used and the hinted index is not the multi-valued index, the generated-column substitution is skipped. This causes the predicate to remain in the original json_extract(...) form: '1' MEMBER OF (j->'$[*]') instead of being rewritten to the multi-valued index generated column form: json'"1"' MEMBER OF (CAST(JSON_EXTRACT(j, '$[*]') AS CHAR(10) ARRAY)) As a result: MEMBER OF loses the multi-valued-index-usable expression form. The filter falls back to per-row json_extract(...) evaluation. The optimizer can no longer recognize the predicate as matching the multi-valued index. The issue appears to be caused by substitute_gc_expression() checking field->table->keys_in_use_for_query. Index hints narrow keys_in_use_for_query to the hinted index, so the multi-valued index bit is absent. This makes the generated column substitution skip the multi-valued index field, even though the substitution is still useful for MEMBER OF evaluation semantics and not only for access path selection. How to repeat: mysql> select version(); +-----------+ | version() | +-----------+ | 9.6.0 | +-----------+ 1 row in set (0.00 sec) mysql> create table t2(id int primary key, j json, key idx ((cast(j->'$[*]' as char(10) array)))); Query OK, 0 rows affected (0.04 sec) mysql> explain format=tree select * from t2 where '1' member of (j->'$[*]'); +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Filter: json'"1"' member of (cast(json_extract(j,_utf8mb4'$[*]') as char(10) array)) (cost=0.35 rows=1) -> Index lookup on t2 using idx (cast(json_extract(j,_utf8mb4'$[*]') as char(10) array) = json'"1"') (cost=0.35 rows=1) | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> explain format=tree select /*+ NO_INDEX(t2) */ * from t2 where '1' member of (j->'$[*]'); +--------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +--------------------------------------------------------------------------------------------------------------------------------+ | -> Filter: <cache>('1') member of (json_extract(t2.j,'$[*]')) (cost=0.35 rows=1) -> Table scan on t2 (cost=0.35 rows=1) | +--------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> explain format=tree select /*+ INDEX(`t2` `PRIMARY`) */ * from t2 where '1' member of (j->'$[*]'); +------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +------------------------------------------------------------------------------------------------------------------------------+ | -> Filter: <cache>('1') member of (json_extract(t2.j,'$[*]')) (cost=0.6 rows=1) -> Table scan on t2 (cost=0.6 rows=1) | +------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> explain format=tree select /*+ INDEX(`t2` `idx`) */ * from t2 where '1' member of (j->'$[*]'); +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Filter: json'"1"' member of (cast(json_extract(j,_utf8mb4'$[*]') as char(10) array)) (cost=0.35 rows=1) -> Index lookup on t2 using idx (cast(json_extract(j,_utf8mb4'$[*]') as char(10) array) = json'"1"') (cost=0.35 rows=1) | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) Expected result: Even if INDEX(t2 PRIMARY) prevents the optimizer from choosing the multi-valued index as the access path, the MEMBER OF predicate should still be rewritten to the generated-column typed-array form when the predicate matches a multi-valued index expression. The index hint should affect access path selection, but should not force the predicate back to the expensive per-row json_extract(...) evaluation form. json_contains and json_overlaps have the same behavior. Suggested fix: diff --git a/sql/item_func.cc b/sql/item_func.cc index bf9823a24ed..824e4e743e9 100644 --- a/sql/item_func.cc +++ b/sql/item_func.cc @@ -1195,7 +1195,7 @@ static bool substitute_gc_expression(Item **expr, Item **value, // Check whether the field has usable keys. Key_map tkm = field->part_of_key; tkm.merge(field->part_of_prefixkey); // Include prefix keys. - tkm.intersect(field->table->keys_in_use_for_query); + if (!field->is_array()) tkm.intersect(field->table->keys_in_use_for_query); /* Don't substitute if: 1) Key is disabled @@ -1287,7 +1287,7 @@ static void gc_subst_overlaps_contains(Item **func, Item **vals, bool can_use_index = true; // Check whether field has usable keys Key_map tkm = fld.part_of_key; - tkm.intersect(fld.table->keys_in_use_for_query); + if (!fld.is_array()) tkm.intersect(fld.table->keys_in_use_for_query); if (tkm.is_clear_all() || !fld.is_array()) continue; Functional_index_error_handler func_idx_err_hndl(&fld, thd); The patch will also solve https://bugs.mysql.com/bug.php?id=112147