Bug #76834 Different result when using index for jsn_extract
Submitted: 25 Apr 2015 11:07 Modified: 7 Oct 2015 9:41
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:5.7.7-labs-json, 5.7.9 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: json

[25 Apr 2015 11:07] Daniël van Eeden
Description:
The resultset is different when an index is used.

How to repeat:
mysql [localhost] {msandbox} (test) > select * from t1 where jsn_extract(j, '$.foo')='test';
Empty set (0.00 sec)

mysql [localhost] {msandbox} (test) > show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `j` json DEFAULT NULL,
  `foo` varchar(30) GENERATED ALWAYS AS (jsn_extract(j, '$.foo')) VIRTUAL,
  UNIQUE KEY `id` (`id`),
  KEY `foo` (`foo`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from t1 ignore index(foo) where jsn_extract(j, '$.foo')='test';
+----+-----------------+--------+
| id | j               | foo    |
+----+-----------------+--------+
|  2 | {"foo": "test"} | "test" |
+----+-----------------+--------+
1 row in set (0.00 sec)
[25 Apr 2015 11:10] Daniël van Eeden
And the behaviour with quotes:

mysql [localhost] {msandbox} (test) > select * from t1 where jsn_extract(j, '$.foo')='"test"';
+----+-----------------+--------+
| id | j               | foo    |
+----+-----------------+--------+
|  2 | {"foo": "test"} | "test" |
+----+-----------------+--------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from t1 ignore index(foo) where jsn_extract(j, '$.foo')='"test"';
Empty set (0.00 sec)
[25 Apr 2015 13:39] Umesh Shastry
Hello Daniël,

Thank you for the report.

Thanks,
Umesh
[25 Apr 2015 13:40] Umesh Shastry
//

mysql> show variables like "%version%";
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| innodb_version          | 5.7.7                        |
| protocol_version        | 10                           |
| slave_type_conversions  |                              |
| version                 | 5.7.7-labs-json              |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | linux-el6                    |
+-------------------------+------------------------------+
7 rows in set (0.01 sec)

mysql> create database test; use test;
Query OK, 1 row affected (0.00 sec)

Database changed
mysql> CREATE TABLE `t1` (
    ->   `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
    ->   `j` json DEFAULT NULL,
    ->   `foo` varchar(30) GENERATED ALWAYS AS (jsn_extract(j, '$.foo')) VIRTUAL,
    ->   UNIQUE KEY `id` (`id`),
    ->   KEY `foo` (`foo`)
    -> ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.06 sec)

mysql> insert into t1(id,j) values(NULL,'{"foo": "test"}');
Query OK, 1 row affected (0.01 sec)

mysql> select * from t1;
+----+-----------------+--------+
| id | j               | foo    |
+----+-----------------+--------+
|  4 | {"foo": "test"} | "test" |
+----+-----------------+--------+
1 row in set (0.00 sec)

mysql> select * from t1 ignore index(foo) where jsn_extract(j, '$.foo')='test';
+----+-----------------+--------+
| id | j               | foo    |
+----+-----------------+--------+
|  4 | {"foo": "test"} | "test" |
+----+-----------------+--------+
1 row in set (0.00 sec)

mysql> select * from t1 where jsn_extract(j, '$.foo')='test';
Empty set (0.02 sec)

mysql> select * from t1 where jsn_extract(j, '$.foo')='"test"';
+----+-----------------+--------+
| id | j               | foo    |
+----+-----------------+--------+
|  4 | {"foo": "test"} | "test" |
+----+-----------------+--------+
1 row in set (0.00 sec)

mysql> select * from t1 ignore index(foo) where jsn_extract(j, '$.foo')='"test"';
Empty set (0.00 sec)
[21 May 2015 4:46] Erlend Dahl
[19 May 2015 0:48] Evgeny Potemkin

Not a bug, but a feature. In JSON strings have special treatment, as they
could represent a scalar string and a JSON string.
Server can't tell one from another, so user have to quote scalar strings. For
consistency reasons JSON valued jsn_* funcs also return quoted strings.
When no matching index is available, unquoting is handled by JSON comparator
internally and invisible to user. When index is used, JSON comparator can't
be used and unquoting have to be handled by user. All this leads to reported behavior.

In short - user has to create string indexes with jsn_unquote function, e.g
... gc varchar(...) as (jsn_unquote(jsn_extract(...))) ...
[6 Oct 2015 13:38] Daniël van Eeden
Also happens on 5.7.9

mysql [localhost] {msandbox} (test) > SHOW CREATE TABLE t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `j` json DEFAULT NULL,
  `foo` varchar(30) GENERATED ALWAYS AS (json_extract(j, '$.foo')) VIRTUAL,
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > SELECT * FROM t1;
+----+-----------------+--------+
| id | j               | foo    |
+----+-----------------+--------+
|  2 | {"foo": "test"} | "test" |
+----+-----------------+--------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from t1 where json_extract(j, '$.foo')='test';
+----+-----------------+--------+
| id | j               | foo    |
+----+-----------------+--------+
|  2 | {"foo": "test"} | "test" |
+----+-----------------+--------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from t1 where json_extract(j, '$.foo')='"test"';
Empty set (0.00 sec)

mysql [localhost] {msandbox} (test) > ALTER TABLE t1 ADD INDEX(foo);
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql [localhost] {msandbox} (test) > select * from t1 where json_extract(j, '$.foo')='test';
Empty set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from t1 where json_extract(j, '$.foo')='"test"';
+----+-----------------+--------+
| id | j               | foo    |
+----+-----------------+--------+
|  2 | {"foo": "test"} | "test" |
+----+-----------------+--------+
1 row in set (0.00 sec)
[6 Oct 2015 14:37] Daniël van Eeden
mysql [localhost] {msandbox} (test) > SHOW CREATE TABLE t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `j` json DEFAULT NULL,
  `foo` varchar(30) GENERATED ALWAYS AS (json_extract(j, '$.foo')) VIRTUAL,
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select *, json_extract(j, '$.foo')='test' as j2 from t1;
+----+----------------------------+-------------------+------+
| id | j                          | foo               | j2   |
+----+----------------------------+-------------------+------+
|  2 | {"foo": "test"}            | "test"            |    1 |
|  3 | {"foo": 5}                 | 5                 |    0 |
|  4 | {"foo": {"bar": "barfoo"}} | {"bar": "barfoo"} |    0 |
|  5 | {"foo": true}              | true              |    0 |
|  7 | {"bar": 7.2}               | NULL              | NULL |
|  8 | {"foo": 7.2}               | 7.2               |    0 |
|  9 | {"foo": null}              | null              |    0 |
+----+----------------------------+-------------------+------+
7 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select *, json_extract(j, '$.foo')='true' as j2 from t1;
+----+----------------------------+-------------------+------+
| id | j                          | foo               | j2   |
+----+----------------------------+-------------------+------+
|  2 | {"foo": "test"}            | "test"            |    0 |
|  3 | {"foo": 5}                 | 5                 |    0 |
|  4 | {"foo": {"bar": "barfoo"}} | {"bar": "barfoo"} |    0 |
|  5 | {"foo": true}              | true              |    0 |
|  7 | {"bar": 7.2}               | NULL              | NULL |
|  8 | {"foo": 7.2}               | 7.2               |    0 |
|  9 | {"foo": null}              | null              |    0 |
+----+----------------------------+-------------------+------+
7 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) > ALTER TABLE t1 ADD INDEX(foo);
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql [localhost] {msandbox} (test) > select *, json_extract(j, '$.foo')='test' as j2 from t1;
+----+----------------------------+-------------------+------+
| id | j                          | foo               | j2   |
+----+----------------------------+-------------------+------+
|  2 | {"foo": "test"}            | "test"            |    1 |
|  3 | {"foo": 5}                 | 5                 |    0 |
|  4 | {"foo": {"bar": "barfoo"}} | {"bar": "barfoo"} |    0 |
|  5 | {"foo": true}              | true              |    0 |
|  7 | {"bar": 7.2}               | NULL              | NULL |
|  8 | {"foo": 7.2}               | 7.2               |    0 |
|  9 | {"foo": null}              | null              |    0 |
+----+----------------------------+-------------------+------+
7 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select *, json_extract(j, '$.foo')='true' as j2 from t1;
+----+----------------------------+-------------------+------+
| id | j                          | foo               | j2   |
+----+----------------------------+-------------------+------+
|  2 | {"foo": "test"}            | "test"            |    0 |
|  3 | {"foo": 5}                 | 5                 |    0 |
|  4 | {"foo": {"bar": "barfoo"}} | {"bar": "barfoo"} |    0 |
|  5 | {"foo": true}              | true              |    0 |
|  7 | {"bar": 7.2}               | NULL              | NULL |
|  8 | {"foo": 7.2}               | 7.2               |    0 |
|  9 | {"foo": null}              | null              |    0 |
+----+----------------------------+-------------------+------+
7 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE json_extract(j, '$.foo')='test';
Empty set (0.00 sec)

mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE json_extract(j, '$.foo')='true';
+----+---------------+------+
| id | j             | foo  |
+----+---------------+------+
|  5 | {"foo": true} | true |
+----+---------------+------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > ALTER TABLE t1 DROP INDEX foo;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE json_extract(j, '$.foo')='true';
Empty set (0.00 sec)

mysql [localhost] {msandbox} (test) > SELECT * FROM t1 WHERE json_extract(j, '$.foo')='test';
+----+-----------------+--------+
| id | j               | foo    |
+----+-----------------+--------+
|  2 | {"foo": "test"} | "test" |
+----+-----------------+--------+
1 row in set (0.00 sec)
[7 Oct 2015 7:21] Daniël van Eeden
This is how PostgreSQL 9.4 behaves with a similar table/index:

postgres=# SELECT VERSION();
                                            version                                            
-----------------------------------------------------------------------------------------------
 PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
(1 row)

postgres=# CREATE SCHEMA jtest;
CREATE SCHEMA
postgres=# CREATE TABLE jtest.t1 (
postgres(#   id SERIAL,
postgres(#   j json DEFAULT NULL
postgres(# );
CREATE TABLE
postgres=# INSERT INTO jtest.t1(j) VALUES('{"foo": "test"}');
INSERT 0 1
postgres=# SELECT *, json_extract_path(j, 'foo') FROM jtest.t1;
 id |        j        | json_extract_path 
----+-----------------+-------------------
  1 | {"foo": "test"} | "test"
(1 row)

postgres=# SELECT *, j->>'foo' FROM jtest.t1;
 id |        j        | ?column? 
----+-----------------+----------
  1 | {"foo": "test"} | test
(1 row)

postgres=# SELECT *, j->>'foo' FROM jtest.t1 WHERE j->>'foo' = 'test';
 id |        j        | ?column? 
----+-----------------+----------
  1 | {"foo": "test"} | test
(1 row)

postgres=# EXPLAIN SELECT *, j->>'foo' FROM jtest.t1 WHERE j->>'foo' = 'test';
                     QUERY PLAN                     
----------------------------------------------------
 Seq Scan on t1  (cost=0.00..28.46 rows=6 width=36)
   Filter: ((j ->> 'foo'::text) = 'test'::text)
(2 rows)

postgres=# CREATE INDEX idxjson1 ON jtest.t1 ((j->>'foo'));
CREATE INDEX
postgres=# SET enable_seqscan = OFF;
SET
postgres=# EXPLAIN SELECT *, j->>'foo' FROM jtest.t1 WHERE j->>'foo' = 'test';
                             QUERY PLAN                             
--------------------------------------------------------------------
 Index Scan using idxjson1 on t1  (cost=0.13..8.15 rows=1 width=36)
   Index Cond: ((j ->> 'foo'::text) = 'test'::text)
(2 rows)

postgres=# SELECT *, j->>'foo' FROM jtest.t1 WHERE j->>'foo' = 'test';
 id |        j        | ?column? 
----+-----------------+----------
  1 | {"foo": "test"} | test
(1 row)

postgres=# 
postgres=# CREATE TABLE jtest.t2 (
postgres(#   id SERIAL,
postgres(#   j jsonb DEFAULT NULL
postgres(# );
CREATE TABLE
postgres=# INSERT INTO jtest.t2(j) SELECT j::jsonb FROM jtest.t1;
INSERT 0 1
postgres=# SELECT *, jsonb_extract_path(j, 'foo') FROM jtest.t2;
 id |        j        | jsonb_extract_path 
----+-----------------+--------------------
  1 | {"foo": "test"} | "test"
(1 row)

postgres=# SELECT *, j->>'foo' FROM jtest.t2;
 id |        j        | ?column? 
----+-----------------+----------
  1 | {"foo": "test"} | test
(1 row)

postgres=# SELECT *, j->>'foo' FROM jtest.t2 WHERE j->>'foo' = 'test';
 id |        j        | ?column? 
----+-----------------+----------
  1 | {"foo": "test"} | test
(1 row)

postgres=# SELECT *, j->>'foo' FROM jtest.t2 WHERE j @> '{"foo": "test"}';
 id |        j        | ?column? 
----+-----------------+----------
  1 | {"foo": "test"} | test
(1 row)

postgres=# EXPLAIN SELECT *, j->>'foo' FROM jtest.t2 WHERE j @> '{"foo": "test"}';
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Seq Scan on t2  (cost=10000000000.00..10000000025.38 rows=1 width=36)
   Filter: (j @> '{"foo": "test"}'::jsonb)
(2 rows)

postgres=# CREATE INDEX idxjsonops ON jtest.t2 USING gin(j jsonb_path_ops);
CREATE INDEX
postgres=# EXPLAIN SELECT *, j->>'foo' FROM jtest.t2 WHERE j @> '{"foo": "test"}';
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Bitmap Heap Scan on t2  (cost=8.00..12.02 rows=1 width=36)
   Recheck Cond: (j @> '{"foo": "test"}'::jsonb)
   ->  Bitmap Index Scan on idxjsonops  (cost=0.00..8.00 rows=1 width=0)
         Index Cond: (j @> '{"foo": "test"}'::jsonb)
(4 rows)

postgres=# SELECT *, j->>'foo' FROM jtest.t2 WHERE j @> '{"foo": "test"}';
 id |        j        | ?column? 
----+-----------------+----------
  1 | {"foo": "test"} | test
(1 row)

postgres=# SET enable_seqscan = ON;
SET
postgres=# DROP SCHEMA jtest CASCADE;
NOTICE:  drop cascades to 2 other objects
DETAIL:  drop cascades to table jtest.t1
drop cascades to table jtest.t2
DROP SCHEMA
[7 Oct 2015 9:41] Knut Anders Hatlen
Looks like we were a bit too quick to close this bug. More or less the same problem was later reported in Bug #78464, and that bug has a patch in the pipeline, which should also fix the problem reported here.

The planned fix will disable the use of the index in the query, so that it returns the same results regardless of the presence of the index. If using an index is desired, one could use JSON_UNQUOTE in the generated column, as suggested in an earlier comment.