Bug #120664 INDEX hint blocks generated column substitution for multi-valued indexes with MEMBER OF
Submitted: 12 Jun 7:56
Reporter: Nan Ma Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:9.6.0 OS:Any
Assigned to: CPU Architecture:Any

[12 Jun 7:56] Nan Ma
Description:
When a query uses MEMBER OF on a JSON array column with a multi-valued functional index, MySQL normally rewrites the JSON expression to the hidden generated column behind the multi-valued index.

However, if an optimizer index hint such as /*+ INDEX(t PRIMARY) */ is used and the hinted index is not the multi-valued index, the generated-column substitution is skipped.

This causes the predicate to remain in the original json_extract(...) form:

'1' MEMBER OF (j->'$[*]')

instead of being rewritten to the multi-valued index generated column form:

json'"1"' MEMBER OF (CAST(JSON_EXTRACT(j, '$[*]') AS CHAR(10) ARRAY))

As a result:

MEMBER OF loses the multi-valued-index-usable expression form.
The filter falls back to per-row json_extract(...) evaluation.
The optimizer can no longer recognize the predicate as matching the multi-valued index.

The issue appears to be caused by substitute_gc_expression() checking field->table->keys_in_use_for_query. Index hints narrow keys_in_use_for_query to the hinted index, so the multi-valued index bit is absent. This makes the generated column substitution skip the multi-valued index field, even though the substitution is still useful for MEMBER OF evaluation semantics and not only for access path selection.

How to repeat:
mysql> select version();
+-----------+
| version() |
+-----------+
| 9.6.0     |
+-----------+
1 row in set (0.00 sec)

mysql> create table t2(id int primary key, j json, key idx ((cast(j->'$[*]' as char(10) array))));
Query OK, 0 rows affected (0.04 sec)

mysql> explain format=tree select * from t2 where '1' member of (j->'$[*]');
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                   |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: json'"1"' member of (cast(json_extract(j,_utf8mb4'$[*]') as char(10) array))  (cost=0.35 rows=1)
    -> Index lookup on t2 using idx (cast(json_extract(j,_utf8mb4'$[*]') as char(10) array) = json'"1"')  (cost=0.35 rows=1)
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> explain format=tree select /*+ NO_INDEX(t2) */  * from t2 where '1' member of (j->'$[*]');
+--------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                        |
+--------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: <cache>('1') member of (json_extract(t2.j,'$[*]'))  (cost=0.35 rows=1)
    -> Table scan on t2  (cost=0.35 rows=1)
 |
+--------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> explain format=tree select /*+ INDEX(`t2` `PRIMARY`) */ * from t2 where '1' member of (j->'$[*]');
+------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                      |
+------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: <cache>('1') member of (json_extract(t2.j,'$[*]'))  (cost=0.6 rows=1)
    -> Table scan on t2  (cost=0.6 rows=1)
 |
+------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> explain format=tree select /*+ INDEX(`t2` `idx`) */ * from t2 where '1' member of (j->'$[*]');
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                   |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: json'"1"' member of (cast(json_extract(j,_utf8mb4'$[*]') as char(10) array))  (cost=0.35 rows=1)
    -> Index lookup on t2 using idx (cast(json_extract(j,_utf8mb4'$[*]') as char(10) array) = json'"1"')  (cost=0.35 rows=1)
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Expected result:

Even if INDEX(t2 PRIMARY) prevents the optimizer from choosing the multi-valued index as the access path, the MEMBER OF predicate should still be rewritten to the generated-column typed-array form when the predicate matches a multi-valued index expression.

The index hint should affect access path selection, but should not force the predicate back to the expensive per-row json_extract(...) evaluation form.

json_contains and json_overlaps have the same behavior.

Suggested fix:
diff --git a/sql/item_func.cc b/sql/item_func.cc
index bf9823a24ed..824e4e743e9 100644
--- a/sql/item_func.cc
+++ b/sql/item_func.cc
@@ -1195,7 +1195,7 @@ static bool substitute_gc_expression(Item **expr, Item **value,
     // Check whether the field has usable keys.
     Key_map tkm = field->part_of_key;
     tkm.merge(field->part_of_prefixkey);  // Include prefix keys.
-    tkm.intersect(field->table->keys_in_use_for_query);
+    if (!field->is_array()) tkm.intersect(field->table->keys_in_use_for_query);
     /*
       Don't substitute if:
       1) Key is disabled
@@ -1287,7 +1287,7 @@ static void gc_subst_overlaps_contains(Item **func, Item **vals,
     bool can_use_index = true;
     // Check whether field has usable keys
     Key_map tkm = fld.part_of_key;
-    tkm.intersect(fld.table->keys_in_use_for_query);
+    if (!fld.is_array()) tkm.intersect(fld.table->keys_in_use_for_query);
 
     if (tkm.is_clear_all() || !fld.is_array()) continue;
     Functional_index_error_handler func_idx_err_hndl(&fld, thd);

The patch will also solve https://bugs.mysql.com/bug.php?id=112147