Bug #78464 Query using JSON_EXTRACT() returns wrong result after adding virtual index
Submitted: 17 Sep 2015 7:30 Modified: 30 Oct 2015 12:19
Reporter: Olav Sandstå Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:5.7.10 OS:Any
Assigned to: CPU Architecture:Any

[17 Sep 2015 7:30] Olav Sandstå
Description:
I have the following table:

CREATE TABLE employees (
  data JSON,
  name VARCHAR(30) AS (JSON_EXTRACT(data, "$.name")) VIRTUAL
);

that stores the following data:

INSERT INTO employees (data) VALUES('{"id": 1, "name": "Jane"}');
INSERT INTO employees (data) VALUES('{"id": 2, "name": "Joe"}');

When running a query to find information about "Jane":

SELECT * FROM employees WHERE JSON_EXTRACT(data, "$.name") = "Jane";
data    name
{"id": 1, "name": "Jane"}       "Jane"

This result is correct.

But if I add an index on the table to make this query run faster:

ALTER TABLE employees ADD INDEX name_idx(name);

then the result is no longer correct (but I guess it runs faster):

SELECT * FROM employees WHERE JSON_EXTRACT(data, "$.name") = "Jane";
data    name

The result is empty. It should have been the same as before adding the index.

How to repeat:
CREATE TABLE employees (
  data JSON,
  name VARCHAR(30) AS (JSON_EXTRACT(data, "$.name")) VIRTUAL
);

INSERT INTO employees (data) VALUES('{"id": 1, "name": "Jane"}');
INSERT INTO employees (data) VALUES('{"id": 2, "name": "Joe"}');

SELECT * FROM employees WHERE JSON_EXTRACT(data, "$.name") = "Jane";

ALTER TABLE employees ADD INDEX name_idx(name);

SELECT * FROM employees WHERE JSON_EXTRACT(data, "$.name") = "Jane";

DROP TABLE employees;
[18 Sep 2015 7:55] Olav Sandstå
Posted by developer:
 
Adding a comment about our documentation (from https://dev.mysql.com/doc/refman/5.7/en/generated-column-index-optimizations.html):

"For example, instead of writing a column definition like this:

doc_name TEXT AS (JSON_EXTRACT(jdoc, '$.name')) STORED

Write it like this:

doc_name TEXT AS (JSON_UNQUOTE(JSON_EXTRACT(jdoc, '$.name'))) STORED

With the latter definition, the optimizer can detect a match for both of these comparisons:

... WHERE JSON_EXTRACT(jdoc, '$.name') = 'some_string' ...
... WHERE JSON_UNQUOTE(JSON_EXTRACT(jdoc, '$.name')) = 'some_string' ...

Without JSON_UNQUOTE() in the column definition, the optimizer detects a match only for the first of those comparisons."

The last sentence seems to me to indicate that even without JSON_UNQUOTE() in the column definition, the optimizer should be able to use this index for the first of the above WHERE conditions? If this is not the case, what does it mean?

Is this documentation in correspondence with our implementation?
[18 Sep 2015 8:14] Olav Sandstå
Posted by developer:
 
Another small (and less serious) comment to the documentation. As the last bullet on https://dev.mysql.com/doc/refman/5.7/en/generated-column-index-optimizations.html:

  "If the optimizer picks the wrong index, an index hint can be used to disable it and force the optimizer to make a different choice."

This makes it almost sound like this bug is "not a bug" but documented behavior. In this bug, the optimizer selects a wrong index for the query, and it almost sounds like it is the user's responsibility to handle this by using an index hint :-)
[9 Oct 2015 8:30] Knut Anders Hatlen
To Olav's first question: "Is this documentation in correspondence with our implementation?" Yes, the documentation describes the implementation correctly. Unfortunately, the implementation is wrong here, since the index won't give the correct answer if the generated column doesn't use JSON_UNQUOTE. The implementation should stop using an index in this case, and the documentation should say: "Without JSON_UNQUOTE() in the column definition, the optimizer will not use the index on the generated column for any of those comparisons."

I also agree with Olav that the wording in the last bullet could be improved. Maybe this instead: "If the optimizer does not pick the desired index, an index hint can be used to force the optimizer to make a different choice."
[30 Oct 2015 12:18] Jon Stephens
Documented fix as follows in the 5.7.10 changelog:

    A query using JSON_EXTRACT() returned the wrong result after a
    virtual index was added to the table.

Also updated the Optimization docs in the manner suggested.

Closed.