Bug #120215 Inconsistent window aggregate results with GROUP BY ... WITH ROLLUP
Submitted: 4 Apr 16:59 Modified: 6 Apr 5:38
Reporter: mu mu Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:9.6.0 OS:Ubuntu (22.04)
Assigned to: CPU Architecture:Any

[4 Apr 16:59] mu mu
Description:
Inconsistent window aggregate results with GROUP BY ... WITH ROLLUP:
SUM(<execution-time-constant expression>) OVER() does not match the row
multiplicity implied by COUNT(*) OVER() in the same SELECT for some grouping
forms (e.g. GROUP BY UTC_DATE() or GROUP BY literal 1), while other grouping
forms (e.g. GROUP BY a table column) yield self-consistent SUM(1) and COUNT(*).

How to repeat:
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (f1 INT NOT NULL);
INSERT INTO t1 VALUES (1), (2);
SET timestamp = UNIX_TIMESTAMP('2009-04-16 12:46:00');
-- Query A
SELECT UTC_DATE() AS g,
       SUM(UNIX_TIMESTAMP(UTC_DATE())) OVER () AS sum_ts,
       COUNT(*) OVER () AS cnt
FROM t1
GROUP BY UTC_DATE() WITH ROLLUP;
-- Query B
SELECT 1 AS g,
       SUM(UNIX_TIMESTAMP(UTC_DATE())) OVER () AS sum_ts,
       COUNT(*) OVER () AS cnt
FROM t1
GROUP BY 1 WITH ROLLUP;
SET timestamp = DEFAULT;
-- Query C
SELECT 1 AS g,
       SUM(1) OVER () AS sum_one,
       COUNT(*) OVER () AS cnt
FROM t1
GROUP BY 1 WITH ROLLUP;
-- Query D
SELECT f1 AS g,
       SUM(1) OVER () AS sum_one,
       COUNT(*) OVER () AS cnt
FROM t1
GROUP BY f1 WITH ROLLUP;

ACTUAL RESULTS (as executed on 9.6.0 with the sql_mode above):
Query A — GROUP BY UTC_DATE() WITH ROLLUP:
+------------+------------+-----+
| g          | sum_ts     | cnt |
+------------+------------+-----+
| 2009-04-16 | 1239840000 |   2 |
| NULL       | 1239840000 |   2 |
+------------+------------+-----+
Query B — GROUP BY 1 WITH ROLLUP (same window expression as A, only GROUP BY changed):
+------+------------+-----+
| g    | sum_ts     | cnt |
+------+------------+-----+
|    1 | 2479680000 |   2 |
| NULL | 2479680000 |   2 |
+------+------------+-----+
Note: 2479680000 = 2 * 1239840000. So Query B matches "sum once per row in
the window" for two rows, while Query A leaves sum_ts at a single-row value
though cnt is 2.
Query C — GROUP BY 1 WITH ROLLUP:
+------+---------+-----+
| g    | sum_one | cnt |
+------+---------+-----+
|    1 |       1 |   2 |
| NULL |       1 |   2 |
+------+---------+-----+
Query D — GROUP BY f1 WITH ROLLUP:
+------+---------+-----+
| g    | sum_one | cnt |
+------+---------+-----+
|    1 |       3 |   3 |
|    2 |       3 |   3 |
| NULL |       3 |   3 |
+------+---------+-----+
EXPECTED BEHAVIOR:
For a given window partition, SUM of a per-row constant should equal that
constant times the number of rows in the partition, and should be consistent
with COUNT(*) OVER() in the same way as in Query B and Query D. In particular,
for Query A, since cnt is 2 on each result row, sum_ts should be 2479680000
(2 * UNIX_TIMESTAMP(UTC_DATE()) for the fixed session timestamp), not 1239840000.
Similarly, for Query C, sum_one should be 2 when cnt is 2, not 1.
[6 Apr 5:38] Chaithra Marsur Gopala Reddy
Hi mu mu,

Thank you for the test case. Verified as described.