Bug #78787 Wrong result when grouping on value from subquery and using COUNT DISTINCT
Submitted: 9 Oct 2015 14:12 Modified: 3 Jan 2018 20:28
Reporter: Knut Anders Hatlen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6.28 OS:Any
Assigned to: CPU Architecture:Any

[9 Oct 2015 14:12] Knut Anders Hatlen
Description:
A query that performs grouping on a JSON value returned from a subquery returns incorrect results.

How to repeat:
CREATE TABLE t1 (j JSON, i INT);
INSERT INTO t1 VALUES ('1', 10), ('1', 11), ('2', 20), ('2', 20), ('3', 30);
SELECT (SELECT t1.j) jj, COUNT(DISTINCT i) FROM t1 GROUP BY -jj;

returns

3	1
2	1
2	3

The last row should have been (1, 2) instead of (2, 3).
[13 Oct 2015 9:05] Knut Anders Hatlen
I was too quick to classify this as a JSON bug. I got the expected result when I used INT instead of JSON for the "j" column, and thought that meant it was caused by the JSON type. However, if I change the data type to TEXT, I see that wrong results are returned, also on 5.6 before the JSON data type was introduced.

Changing category to optimizer and updating the bug synopsis.
[3 Jan 2018 20:28] Jon Stephens
Documented fix in the MySQL 8.0.4 changelog as follows:

    A query that grouped results on a subquery which returned a BLOB or
    BLOB-based type such as JSON sometimes failed to find the group
    boundaries and so returned incorrect results.

Closed.