Bug #104388 Multi-valued index is not in possible keys when there is a subquery
Submitted: 23 Jul 2021 1:11 Modified: 9 Aug 2021 16:52
Reporter: Jinlin Yang Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.25 OS:Any
Assigned to: CPU Architecture:Any
Tags: multi-valued index, subquery

[23 Jul 2021 1:11] Jinlin Yang
Description:
If we table contains some column based index and also some multi-valued index for a json column, then the query optimizer behaves differently depending on if subquery is used.

Specifically when a subquery is used then the multi-valued index is NOT listed as a possible keys in the output of the explain command. However, if we change the query not to use any subquery, then the multi-valued index is among the possible keys.

This makes the multi-valued index unusable for our scenario where we always use a subquery to filter on the OrgID column first as shown in our example.

How to repeat:
mysql> STATUS;
--------------
mysql  Ver 8.0.25 for macos11.3 on x86_64 (Homebrew)

Connection id:		18
Current database:	myapp
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		8.0.25 Homebrew
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8mb4
Db     characterset:	utf8mb4
Client characterset:	utf8mb4
Conn.  characterset:	utf8mb4
UNIX socket:		/tmp/mysql.sock
Binary data as:		Hexadecimal
Uptime:			41 min 5 sec

Threads: 8  Questions: 958  Slow queries: 0  Opens: 1005  Flush tables: 3  Open tables: 912  Queries per second avg: 0.388
--------------

mysql> 
mysql> DROP TABLE IF EXISTS test_json_index;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE test_json_index
    -> (
    -> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    -> OrgID INT NOT NULL,
    ->     BinID INT NOT NULL,
    -> DataBlob json NOT NULL
    -> );
Query OK, 0 rows affected (0.01 sec)

mysql> ALTER TABLE test_json_index
    -> ADD INDEX (OrgID),
    -> ADD INDEX (BinID),
    -> ADD INDEX indexA (( CAST(DataBlob->'$.a' AS UNSIGNED INTEGER ARRAY) )),
    -> ADD INDEX indexB (( CAST(DataBlob->'$.b' AS UNSIGNED ) ));
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> 
mysql> INSERT INTO test_json_index (OrgID, BinID, DataBlob)
    -> VALUES(
    -> 1, 1, "{\"a\":[1, 2, 3], \"b\": 1}"
    -> );
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO test_json_index (OrgID, BinID, DataBlob)
    -> VALUES(
    -> 2, 2, "{\"a\":[4, 5, 6], \"b\": 2}"
    -> );
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> SHOW INDEX FROM test_json_index;
+-----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------------------------------------------------------------+
| Table           | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression                                                       |
+-----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------------------------------------------------------------+
| test_json_index |          0 | PRIMARY  |            1 | id          | A         |           1 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL                                                             |
| test_json_index |          1 | OrgID    |            1 | OrgID       | A         |           1 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL                                                             |
| test_json_index |          1 | BinID    |            1 | BinID       | A         |           1 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL                                                             |
| test_json_index |          1 | indexA   |            1 | NULL        | A         |           1 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | cast(json_extract(`DataBlob`,_utf8mb4\'$.a\') as unsigned array) |
| test_json_index |          1 | indexB   |            1 | NULL        | A         |           1 |     NULL |   NULL | YES  | BTREE      |         |               | YES     | cast(json_extract(`DataBlob`,_utf8mb4\'$.b\') as unsigned)       |
+-----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------------------------------------------------------------+
5 rows in set (0.00 sec)

mysql> 
mysql> -- Filter on OrgID and DataBlob.a with a subquery, why is indexA missing in possible keys?
mysql> EXPLAIN SELECT *
    -> FROM (SELECT * from test_json_index where OrgID = 1) as tmp
    -> WHERE 1 member of(tmp.DataBlob->'$.a');
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table           | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test_json_index | NULL       | ref  | OrgID         | OrgID | 4       | const |    1 |   100.00 | Using where |
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> 
mysql> -- Filter on OrgID and DataBlob.a without a subquery
mysql> EXPLAIN SELECT *
    -> FROM test_json_index
    -> WHERE 1 member of(DataBlob->'$.a') AND OrgID = 1;
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table           | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test_json_index | NULL       | ref  | OrgID,indexA  | OrgID | 4       | const |    1 |   100.00 | Using where |
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> 
mysql> -- Filter on OrgID and DataBlob.b with a subquery, why is indexB missing in possible keys?
mysql> EXPLAIN SELECT *
    -> FROM (SELECT * from test_json_index where OrgID = 1) as tmp
    -> WHERE 1 = CAST(DataBlob->'$.b' AS UNSIGNED);
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table           | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test_json_index | NULL       | ref  | OrgID         | OrgID | 4       | const |    1 |   100.00 | Using where |
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> 
mysql> -- Filter on OrgID and DataBlob.b without a subquery
mysql> EXPLAIN SELECT *
    -> FROM test_json_index
    -> WHERE 1 = CAST(DataBlob->'$.b' AS UNSIGNED) AND OrgID = 1;
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table           | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test_json_index | NULL       | ref  | OrgID,indexB  | OrgID | 4       | const |    1 |    50.00 | Using where |
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> 
mysql> -- Filter on OrgID and BinID with a subquery, index on BinID is in possible keys
mysql> EXPLAIN SELECT *
    -> FROM (SELECT * from test_json_index where OrgID = 1) as tmp
    -> WHERE 1 = tmp.BinID;
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table           | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test_json_index | NULL       | ref  | OrgID,BinID   | OrgID | 4       | const |    1 |    50.00 | Using where |
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> 
mysql> -- Filter on OrgID and BinID without a subquery
mysql> EXPLAIN SELECT *
    -> FROM test_json_index
    -> WHERE 1 = BinID AND OrgID = 1;
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
| id | select_type | table           | partitions | type | possible_keys | key   | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | test_json_index | NULL       | ref  | OrgID,BinID   | OrgID | 4       | const |    1 |    50.00 | Using where |
+----+-------------+-----------------+------------+------+---------------+-------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
[23 Jul 2021 1:13] Jinlin Yang
Script to repro the issue

Attachment: repro-script.sql (application/octet-stream, text), 982 bytes.

[23 Jul 2021 1:13] Jinlin Yang
Output of the repro script

Attachment: repro-script-output.txt (text/plain), 9.10 KiB.

[23 Jul 2021 12:47] MySQL Verification Team
Hi Mr. Yang,

Thank you for your bug report.

However, this is not a bug.

The manner in which you wrote the query with a nested query in the FROM clause, has made that nested query a dependent one, since some of the filtering conditions are moved away from the nested query.

If you wrote that query correctly, it would have used multi-valued index.

Not a bug.
[23 Jul 2021 15:57] Jinlin Yang
So how can I write the following query so that indexA is shown as possible keys in the explain command?
/* query1 */
SELECT *
FROM (SELECT * from test_json_index where OrgID = 1) as tmp
WHERE 1 member of(tmp.DataBlob->'$.a');

Also can you explain why the following query has index BinID in the possible keys? It also uses a nested query, the only difference is the where condition on the outer query is on a column instead of on a json path. Shouldn't they behave the same?

/* query2 */
SELECT *
FROM (SELECT * from test_json_index where OrgID = 1) as tmp
WHERE 1 = tmp.BinID;

For query1, I also turned on the optimizer_tracer. Below you can see the nested query is merged with the outer query. So why isn't indexA picked up?

          {
            "transformations_to_nested_joins": {
              "transformations": [
                "JOIN_condition_to_WHERE",
                "parenthesis_removal"
              ],
              "expanded_query": "/* select#1 */ select `test_json_index`.`id` AS `id`,`test_json_index`.`OrgID` AS `OrgID`,`test_json_index`.`BinID` AS `BinID`,`test_json_index`.`DataBlob` AS `DataBlob` from `test_json_index` where (1 member of (json_extract(`test_json_index`.`DataBlob`,'$.a')) and (`test_json_index`.`OrgID` = 1))"
            }
          }

          ....

          {
            "substitute_generated_columns": {
              "resulting_condition": "(1 member of (json_extract(`test_json_index`.`DataBlob`,'$.a')) and multiple equal(1, `test_json_index`.`OrgID`))"
            }
          }
          ....

                  "potential_range_indexes": [
                    {
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "OrgID",
                      "usable": true,
                      "key_parts": [
                        "OrgID",
                        "id"
                      ]
                    },
                    {
                      "index": "BinID",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "indexA",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "indexB",
                      "usable": false,
                      "cause": "not_applicable"
                    }
                  ],
[26 Jul 2021 13:09] MySQL Verification Team
Hi!

Your question  has already been answered in our previous comment.
[6 Aug 2021 8:46] Roy Lyseng
Hi Jinlin,

it seems you will get the result that you expect on release 8.0.26.
[6 Aug 2021 11:51] MySQL Verification Team
Thank you, Roy ....
[9 Aug 2021 16:52] Jinlin Yang
Hi Roy,

Thank you very much for looking into this. I can confirm that it worked as expected in 8.0.26. So was this a bug in 8.0.25?

Jinlin