Description:
When using JSON_OBJECTAGG together with GROUP BY on a nullable column, adding DISTINCT to the outer SELECT incorrectly removes non-duplicate rows.
Specifically, the query:
SELECT DISTINCT JSON_OBJECTAGG('key', t1.c0)
FROM t1
GROUP BY t1.c0;
returns fewer rows than expected, even though the aggregated JSON results are semantically distinct.
This indicates that MySQL incorrectly treats distinct JSON objects as duplicates during DISTINCT processing, likely due to improper comparison or internal normalization of JSON values.
How to repeat:
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
c0 DECIMAL
);
INSERT INTO t1 (c0) VALUES (0);
INSERT INTO t1 (c0) VALUES (NULL);
-- Query 1 (baseline)
SELECT JSON_OBJECTAGG('key', t1.c0)
FROM t1
GROUP BY t1.c0;
-- Query 2 (with DISTINCT)
SELECT DISTINCT JSON_OBJECTAGG('key', t1.c0)
FROM t1
GROUP BY t1.c0;
Expected result:
Query 1:
{"key": null}
{"key": 0}
Query 2 (DISTINCT should not remove any rows):
{"key": null}
{"key": 0}
Actual result:
Query 1:
{"key": null}
{"key": 0}
Query 2:
{"key": null}
Suggested fix:
Ensure that DISTINCT comparison over JSON values correctly distinguishes semantically different JSON objects, especially when values differ (e.g., null vs 0).
The issue may stem from:
• Improper JSON comparison logic
• Incorrect hashing or canonicalization of JSON values
• Or unintended interaction between DISTINCT and GROUP BY during optimization
A fix should ensure that JSON values are compared based on their full semantic content rather than simplified or lossy internal representations.
Description: When using JSON_OBJECTAGG together with GROUP BY on a nullable column, adding DISTINCT to the outer SELECT incorrectly removes non-duplicate rows. Specifically, the query: SELECT DISTINCT JSON_OBJECTAGG('key', t1.c0) FROM t1 GROUP BY t1.c0; returns fewer rows than expected, even though the aggregated JSON results are semantically distinct. This indicates that MySQL incorrectly treats distinct JSON objects as duplicates during DISTINCT processing, likely due to improper comparison or internal normalization of JSON values. How to repeat: DROP TABLE IF EXISTS t1; CREATE TABLE t1 ( c0 DECIMAL ); INSERT INTO t1 (c0) VALUES (0); INSERT INTO t1 (c0) VALUES (NULL); -- Query 1 (baseline) SELECT JSON_OBJECTAGG('key', t1.c0) FROM t1 GROUP BY t1.c0; -- Query 2 (with DISTINCT) SELECT DISTINCT JSON_OBJECTAGG('key', t1.c0) FROM t1 GROUP BY t1.c0; Expected result: Query 1: {"key": null} {"key": 0} Query 2 (DISTINCT should not remove any rows): {"key": null} {"key": 0} Actual result: Query 1: {"key": null} {"key": 0} Query 2: {"key": null} Suggested fix: Ensure that DISTINCT comparison over JSON values correctly distinguishes semantically different JSON objects, especially when values differ (e.g., null vs 0). The issue may stem from: • Improper JSON comparison logic • Incorrect hashing or canonicalization of JSON values • Or unintended interaction between DISTINCT and GROUP BY during optimization A fix should ensure that JSON values are compared based on their full semantic content rather than simplified or lossy internal representations.