| Bug #78464 | Query using JSON_EXTRACT() returns wrong result after adding virtual index | ||
|---|---|---|---|
| Submitted: | 17 Sep 2015 7:30 | Modified: | 30 Oct 2015 12:19 | 
| Reporter: | Olav Sandstå | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: JSON | Severity: | S3 (Non-critical) | 
| Version: | 5.7.10 | OS: | Any | 
| Assigned to: | CPU Architecture: | Any | |
   [18 Sep 2015 7:55]
   Olav Sandstå        
  Posted by developer: Adding a comment about our documentation (from https://dev.mysql.com/doc/refman/5.7/en/generated-column-index-optimizations.html): "For example, instead of writing a column definition like this: doc_name TEXT AS (JSON_EXTRACT(jdoc, '$.name')) STORED Write it like this: doc_name TEXT AS (JSON_UNQUOTE(JSON_EXTRACT(jdoc, '$.name'))) STORED With the latter definition, the optimizer can detect a match for both of these comparisons: ... WHERE JSON_EXTRACT(jdoc, '$.name') = 'some_string' ... ... WHERE JSON_UNQUOTE(JSON_EXTRACT(jdoc, '$.name')) = 'some_string' ... Without JSON_UNQUOTE() in the column definition, the optimizer detects a match only for the first of those comparisons." The last sentence seems to me to indicate that even without JSON_UNQUOTE() in the column definition, the optimizer should be able to use this index for the first of the above WHERE conditions? If this is not the case, what does it mean? Is this documentation in correspondence with our implementation?
   [18 Sep 2015 8:14]
   Olav Sandstå        
  Posted by developer: Another small (and less serious) comment to the documentation. As the last bullet on https://dev.mysql.com/doc/refman/5.7/en/generated-column-index-optimizations.html: "If the optimizer picks the wrong index, an index hint can be used to disable it and force the optimizer to make a different choice." This makes it almost sound like this bug is "not a bug" but documented behavior. In this bug, the optimizer selects a wrong index for the query, and it almost sounds like it is the user's responsibility to handle this by using an index hint :-)
   [9 Oct 2015 8:30]
   Knut Anders Hatlen        
  To Olav's first question: "Is this documentation in correspondence with our implementation?" Yes, the documentation describes the implementation correctly. Unfortunately, the implementation is wrong here, since the index won't give the correct answer if the generated column doesn't use JSON_UNQUOTE. The implementation should stop using an index in this case, and the documentation should say: "Without JSON_UNQUOTE() in the column definition, the optimizer will not use the index on the generated column for any of those comparisons." I also agree with Olav that the wording in the last bullet could be improved. Maybe this instead: "If the optimizer does not pick the desired index, an index hint can be used to force the optimizer to make a different choice."
   [30 Oct 2015 12:18]
   Jon Stephens        
  Documented fix as follows in the 5.7.10 changelog:
    A query using JSON_EXTRACT() returned the wrong result after a
    virtual index was added to the table.
Also updated the Optimization docs in the manner suggested.
Closed.
 

Description: I have the following table: CREATE TABLE employees ( data JSON, name VARCHAR(30) AS (JSON_EXTRACT(data, "$.name")) VIRTUAL ); that stores the following data: INSERT INTO employees (data) VALUES('{"id": 1, "name": "Jane"}'); INSERT INTO employees (data) VALUES('{"id": 2, "name": "Joe"}'); When running a query to find information about "Jane": SELECT * FROM employees WHERE JSON_EXTRACT(data, "$.name") = "Jane"; data name {"id": 1, "name": "Jane"} "Jane" This result is correct. But if I add an index on the table to make this query run faster: ALTER TABLE employees ADD INDEX name_idx(name); then the result is no longer correct (but I guess it runs faster): SELECT * FROM employees WHERE JSON_EXTRACT(data, "$.name") = "Jane"; data name The result is empty. It should have been the same as before adding the index. How to repeat: CREATE TABLE employees ( data JSON, name VARCHAR(30) AS (JSON_EXTRACT(data, "$.name")) VIRTUAL ); INSERT INTO employees (data) VALUES('{"id": 1, "name": "Jane"}'); INSERT INTO employees (data) VALUES('{"id": 2, "name": "Joe"}'); SELECT * FROM employees WHERE JSON_EXTRACT(data, "$.name") = "Jane"; ALTER TABLE employees ADD INDEX name_idx(name); SELECT * FROM employees WHERE JSON_EXTRACT(data, "$.name") = "Jane"; DROP TABLE employees;