Bug #100885 generated column index against json column is not used with different encoding
Submitted: 18 Sep 2020 3:43 Modified: 18 Sep 2020 7:28
Reporter: Nikolai Ikhalainen Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: JSON Severity:S5 (Performance)
Version:5.7.31 OS:Any
Assigned to: CPU Architecture:Any
Tags: character set, generated columns, json, Optimizer

[18 Sep 2020 3:43] Nikolai Ikhalainen
Description:
The index on generated column containing json extract expression
(json_unquote(json_extract(`j`,'$."a"'))) "remembers" character set for '$."a"' string. After changing client encoding with "set names" the index is not used.

Starting from MySQL 8.0.17, the optimizer is smart enough to handle both cases and use indexes.

mysql5731> explain select id,JSON_TYPE(j->"$.a") from json_test where `j`->'$."a"' = '1';
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | json_test | NULL       | ref  | a             | a    | 3075    | const |    2 |   100.00 | NULL  |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

mysql5731> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql5731> explain select id,JSON_TYPE(j->"$.a") from json_test where `j`->'$."a"' = '1';
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | json_test | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql5731> explain select id,JSON_TYPE(j->"$.a") from json_test where json_extract(`j`,_utf8mb4'$."a"') = '1';
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | json_test | NULL       | ref  | a             | a    | 3075    | const |    2 |   100.00 | NULL  |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

How to repeat:
docker run --name m5731 -d -e MYSQL_ROOT_PASSWORD=secret mysql/mysql-server:5.7.31
docker exec -it m5731 mysql --prompt='mysql5731> ' -psecret

set names utf8mb4;
create database test default character set utf8mb4;

use test

DROP TABLE IF EXISTS `json_test`;
CREATE TABLE `json_test` (
	  `id` int unsigned NOT NULL AUTO_INCREMENT,
	  `j` json DEFAULT NULL,
	  `c` varchar(768) GENERATED ALWAYS AS (json_unquote(json_extract(`j`,'$."a"'))) VIRTUAL,
	  PRIMARY KEY (`id`),
	  KEY `a` (`c`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
INSERT INTO `json_test` (`id`, `j`) VALUES (3,'{\"a\": \"1\"}'),(4,'{\"a\": 1}');

explain select id,JSON_TYPE(j->"$.a") from json_test where `j`->'$."a"' = '1';
set names utf8;
explain select id,JSON_TYPE(j->"$.a") from json_test where `j`->'$."a"' = '1';
explain select id,JSON_TYPE(j->"$.a") from json_test where json_extract(`j`,_utf8mb4'$."a"') = '1';

Suggested fix:
Make the behavior consistent between 5.7 and 8.0 (e.g. use the index regardless to encoding) or document requirement to keep character set at
https://dev.mysql.com/doc/refman/5.7/en/create-table-secondary-indexes.html
[18 Sep 2020 7:28] MySQL Verification Team
Hello Nikolai,

Thank you for the report and feedback.

Thanks,
Umesh
[9 Nov 2020 11:38] Eduardo Ubide
Expanding on Nickolay's case, I think it's something that affects the index's integrity due to the session's charset at insert time. 

Regardless of the definition of the table. If we add data in a table from different "set names" the index on a generated column containing JSON extract expression is unusable.

I checked it on 5.7.29 and 5.7.31.

# Example: 

set names utf8mb4;

mysql5731> INSERT INTO `json_test` (`id`, `j`) VALUES (3,'{\"a\": \"1\"}'),(4,'{\"a\": 1}');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql5731> explain select id,JSON_TYPE(j->"$.a") from json_test where `j`->'$."a"' = '1';
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | json_test | NULL       | ref  | a             | a    | 3075    | const |    2 |   100.00 | NULL  |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

mysql5731> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql5731> explain select id,JSON_TYPE(j->"$.a") from json_test where `j`->'$."a"' = '1';
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | json_test | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql5731> explain select id,JSON_TYPE(j->"$.a") from json_test where json_extract(`j`,_utf8mb4'$."a"') = '1';
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | json_test | NULL       | ref  | a             | a    | 3075    | const |    2 |   100.00 | NULL  |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

mysql5731> INSERT INTO `json_test` (`id`, `j`) VALUES (5,'{\"a\": \"1\"}'),(6,'{\"a\": 1}');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql5731> explain select id,JSON_TYPE(j->"$.a") from json_test where json_extract(`j`,_utf8mb4'$."a"') = '1';
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | json_test | NULL       | ALL  | a             | NULL | NULL    | NULL |    4 |   100.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

mysql5731> set names utf8mb4 ;
Query OK, 0 rows affected (0.00 sec)

mysql5731> explain select id,JSON_TYPE(j->"$.a") from json_test where `j`->'$."a"' = '1';
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | json_test | NULL       | ALL  | a             | NULL | NULL    | NULL |    4 |   100.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql5731> explain select id,JSON_TYPE(j->"$.a") from json_test where json_extract(`j`,_utf8mb4'$."a"') = '1';
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | json_test | NULL       | ALL  | a             | NULL | NULL    | NULL |    4 |   100.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

# How to repeat:

docker run --name m5731 -d -e MYSQL_ROOT_PASSWORD=secret mysql/mysql-server:5.7.31
docker exec -it m5731 mysql --prompt='mysql5731> ' -psecret

set names utf8mb4;
create database test default character set utf8mb4;

use test

DROP TABLE IF EXISTS `json_test`;
CREATE TABLE `json_test` (
	  `id` int unsigned NOT NULL AUTO_INCREMENT,
	  `j` json DEFAULT NULL,
	  `c` varchar(768) GENERATED ALWAYS AS (json_unquote(json_extract(`j`,'$."a"'))) VIRTUAL,
	  PRIMARY KEY (`id`),
	  KEY `a` (`c`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- utf8mb4
INSERT INTO `json_test` (`id`, `j`) VALUES (3,'{\"a\": \"1\"}'),(4,'{\"a\": 1}');

explain select id,JSON_TYPE(j->"$.a") from json_test where `j`->'$."a"' = '1';

set names utf8;
explain select id,JSON_TYPE(j->"$.a") from json_test where `j`->'$."a"' = '1';
explain select id,JSON_TYPE(j->"$.a") from json_test where json_extract(`j`,_utf8mb4'$."a"') = '1';

-- utf8
INSERT INTO `json_test` (`id`, `j`) VALUES (5,'{\"a\": \"1\"}'),(6,'{\"a\": 1}');
explain select id,JSON_TYPE(j->"$.a") from json_test where json_extract(`j`,_utf8mb4'$."a"') = '1';

set names utf8mb4;
explain select id,JSON_TYPE(j->"$.a") from json_test where `j`->'$."a"' = '1';
explain select id,JSON_TYPE(j->"$.a") from json_test where json_extract(`j`,_utf8mb4'$."a"') = '1';