Description:
When using JSON_ARRAYAGG() together with GROUP BY and DISTINCT, the result set may lose rows when one of the groups contains only NULL values.
Specifically, JSON_ARRAYAGG() over a group containing only NULL values appears to be treated as SQL NULL instead of a JSON array (e.g., [null, null, ...]). As a result, applying DISTINCT collapses such rows, leading to fewer rows than expected.
This behavior is inconsistent with the expectation that JSON_ARRAYAGG() always returns a JSON array and that DISTINCT operates on the resulting JSON values.
How to repeat:
DROP TABLE IF EXISTS t10;
CREATE TABLE t10 (
c0 DECIMAL NULL
);
INSERT INTO t10 (c0) VALUES (0);
INSERT INTO t10 (c0) VALUES (NULL);
INSERT INTO t10 (c0) VALUES (0);
INSERT INTO t10 (c0) VALUES (0);
INSERT INTO t10 (c0) VALUES (NULL);
INSERT INTO t10 (c0) VALUES (NULL);
-- Query 1
SELECT JSON_ARRAYAGG(t10.c0)
FROM t10
GROUP BY t10.c0;
-- Query 2
SELECT DISTINCT JSON_ARRAYAGG(t10.c0)
FROM t10
GROUP BY t10.c0;
Expected result:
Query 1:
[0, 0, 0]
[null, null, null]
Query 2 (DISTINCT should not change result since rows are different):
[0, 0, 0]
[null, null, null]
Actual result:
Query 1:
[0, 0, 0]
[null, null, null]
Query 2:
[null, null, null]
Suggested fix:
Ensure that JSON_ARRAYAGG() consistently returns a JSON value (e.g., a JSON array) even when all aggregated values are NULL, instead of collapsing to SQL NULL.
Additionally, DISTINCT should compare JSON values semantically rather than treating certain JSON results as SQL NULL.
Description: When using JSON_ARRAYAGG() together with GROUP BY and DISTINCT, the result set may lose rows when one of the groups contains only NULL values. Specifically, JSON_ARRAYAGG() over a group containing only NULL values appears to be treated as SQL NULL instead of a JSON array (e.g., [null, null, ...]). As a result, applying DISTINCT collapses such rows, leading to fewer rows than expected. This behavior is inconsistent with the expectation that JSON_ARRAYAGG() always returns a JSON array and that DISTINCT operates on the resulting JSON values. How to repeat: DROP TABLE IF EXISTS t10; CREATE TABLE t10 ( c0 DECIMAL NULL ); INSERT INTO t10 (c0) VALUES (0); INSERT INTO t10 (c0) VALUES (NULL); INSERT INTO t10 (c0) VALUES (0); INSERT INTO t10 (c0) VALUES (0); INSERT INTO t10 (c0) VALUES (NULL); INSERT INTO t10 (c0) VALUES (NULL); -- Query 1 SELECT JSON_ARRAYAGG(t10.c0) FROM t10 GROUP BY t10.c0; -- Query 2 SELECT DISTINCT JSON_ARRAYAGG(t10.c0) FROM t10 GROUP BY t10.c0; Expected result: Query 1: [0, 0, 0] [null, null, null] Query 2 (DISTINCT should not change result since rows are different): [0, 0, 0] [null, null, null] Actual result: Query 1: [0, 0, 0] [null, null, null] Query 2: [null, null, null] Suggested fix: Ensure that JSON_ARRAYAGG() consistently returns a JSON value (e.g., a JSON array) even when all aggregated values are NULL, instead of collapsing to SQL NULL. Additionally, DISTINCT should compare JSON values semantically rather than treating certain JSON results as SQL NULL.