Bug #120102 Incorrect DISTINCT behavior with JSON_ARRAYAGG when aggregating NULL-only groups
Submitted: 19 Mar 1:34 Modified: 19 Mar 8:22
Reporter: Wang Ojiken Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:8.0.45 OS:Any
Assigned to: CPU Architecture:Any
Tags: distinct, group-by, JSON_ARRAYAGG, null, wrong-result

[19 Mar 1:34] Wang Ojiken
Description:
When using JSON_ARRAYAGG() together with GROUP BY and DISTINCT, the result set may lose rows when one of the groups contains only NULL values.

Specifically, JSON_ARRAYAGG() over a group containing only NULL values appears to be treated as SQL NULL instead of a JSON array (e.g., [null, null, ...]). As a result, applying DISTINCT collapses such rows, leading to fewer rows than expected.

This behavior is inconsistent with the expectation that JSON_ARRAYAGG() always returns a JSON array and that DISTINCT operates on the resulting JSON values.

How to repeat:
DROP TABLE IF EXISTS t10;
CREATE TABLE t10 (
    c0 DECIMAL NULL
);

INSERT INTO t10 (c0) VALUES (0);
INSERT INTO t10 (c0) VALUES (NULL);
INSERT INTO t10 (c0) VALUES (0);
INSERT INTO t10 (c0) VALUES (0);
INSERT INTO t10 (c0) VALUES (NULL);
INSERT INTO t10 (c0) VALUES (NULL);

-- Query 1
SELECT JSON_ARRAYAGG(t10.c0)
FROM t10
GROUP BY t10.c0;

-- Query 2
SELECT DISTINCT JSON_ARRAYAGG(t10.c0)
FROM t10
GROUP BY t10.c0;

Expected result:
Query 1:
[0, 0, 0]
[null, null, null]
Query 2 (DISTINCT should not change result since rows are different):
[0, 0, 0]
[null, null, null]

Actual result:
Query 1:
[0, 0, 0]
[null, null, null]
Query 2:
[null, null, null]

Suggested fix:
Ensure that JSON_ARRAYAGG() consistently returns a JSON value (e.g., a JSON array) even when all aggregated values are NULL, instead of collapsing to SQL NULL.

Additionally, DISTINCT should compare JSON values semantically rather than treating certain JSON results as SQL NULL.
[19 Mar 8:22] Roy Lyseng
Thank you for the bug report.
Verified that DISTINCT should not filter out rows from the GROUP BY operation.