Bug #107220 Json query results are inconsistent with or without generated column indexes
Submitted: 5 May 2022 9:18 Modified: 5 May 2022 11:12
Reporter: Chen WeiXin Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:MySQL 8.0, 8.0.29 OS:Any
Assigned to: CPU Architecture:Any
Tags: generated, json

[5 May 2022 9:18] Chen WeiXin
Description:
I don't think the query results should differ based on whether there is an index or not.

How to repeat:
1. The result can be queried in this SQL:
drop table jemp;
create table jemp(c json, g int generated always as (c->"$.id")) ;
create index idx1 on jemp(g);
insert into jemp(c) values('{"id":"1", "name":"Fred"}');
select c from jemp where c->'$.id' = 1;

2. The result can not be queried in this SQL:
drop table jemp;
create table jemp(c json, g int generated always as (c->"$.id")) ;
insert into jemp(c) values('{"id":"1", "name":"Fred"}');
select c from jemp where c->'$.id' = 1;

Suggested fix:
The query results should not differ based on whether there is an index or not.
[5 May 2022 9:55] MySQL Verification Team
Hello Chen WeiXin,

Thank you for the report and test case.

regards,
Umesh
[5 May 2022 11:12] Chen WeiXin
1. with index
mysql> create table jemp(c json, g int generated always as (c->"$.id")) ;
Query OK, 0 rows affected (0.13 sec)

mysql> create index idx1 on jemp(g);
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into jemp(c) values('{"id":"1", "name":"Fred"}');
Query OK, 1 row affected (0.01 sec)

mysql> select c from jemp where c->'$.id' = 1;
+-----------------------------+
| c                           |
+-----------------------------+
| {"id": "1", "name": "Fred"} |
+-----------------------------+
1 row in set (0.01 sec)

2. without index
mysql> create table jemp(c json, g int generated always as (c->"$.id")) ;
Query OK, 0 rows affected (0.04 sec)

mysql> insert into jemp(c) values('{"id":"1", "name":"Fred"}');
Query OK, 1 row affected (0.03 sec)

mysql> select c from jemp where c->'$.id' = 1;
Empty set (0.00 sec)
[11 May 2022 14:09] huahua xu
The bug is produced when substituting the expressions in the WHERE condition that match generated columns (GC) expressions with GC fields, and GC fields are a part of a key

I think the bug is very difficult to fix in the query optimizer. But, it will be easy that type conversion is not allowed in the json function, which maybe avoids this problem.