Bug #119569 any_value within window function returns wrong result
Submitted: 17 Dec 7:40 Modified: 18 Dec 9:32
Reporter: chang wanli (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: any_value, window function

[17 Dec 7:40] chang wanli
Description:
When using ANY_VALUE within a window function, it may return incorrect results. See:
SQL1: SELECT   LEAD(c2) OVER ( ORDER BY c3) FROM t1 ORDER BY 1;
SQL2: SELECT   LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) FROM t1 ORDER BY 1;

mysql> SELECT   LEAD(c2) OVER ( ORDER BY c3) FROM t1 ORDER BY 1;
+------------------------------+
| LEAD(c2) OVER ( ORDER BY c3) |
+------------------------------+
|                         NULL |
|                            1 |
|                            3 |
+------------------------------+
3 rows in set (0.00 sec)

mysql> SELECT   LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) FROM t1 ORDER BY 1;
+----------------------------------------------------+
| LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) |
+----------------------------------------------------+
|                                               NULL |
|                                                  1 |
|                                                  3 |
+----------------------------------------------------+
3 rows in set (0.00 sec)

There is no PARTITION BY or GROUP BY clause in the query SQL2, so the entire table is treated as a single partition. Due to the presence of ANY_VALUE, the query should return only one row of data.

Since no explicit definition is provided for ANY_VALUE in the context of both aggregate functions (https://dev.mysql.com/doc/refman/8.4/en/miscellaneous-functions.html#function_any-value) and window functions (https://dev.mysql.com/doc/refman/8.4/en/window-function-descriptions.html), it is unclear whether this behavior is as expected.

We have also tested this on other databases, such as Oracle and DuckDB, and they all returned only one row of data.
Oracle:
> SELECT   LEAD(c2) OVER ( ORDER BY c3) FROM t1 ORDER BY 1;
+------------------------------+
| LEAD(c2) OVER ( ORDER BY c3) |
+------------------------------+
|                         NULL |
|                            1 |
|                            3 |
+------------------------------+

> SELECT LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) FROM t1 ORDER BY 1;
+----------------------------------------------------+
| LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) |
+----------------------------------------------------+
|                                               NULL |
+----------------------------------------------------+

DuckDB:
D SELECT   LEAD(c2) OVER ( ORDER BY c3) FROM t1 ORDER BY 1;
┌─────────────────────────────┐
│ lead(c2) OVER (ORDER BY c3) │
│            int32            │
├─────────────────────────────┤
│                           1 │
│                           3 │
│                        NULL │
└─────────────────────────────┘

D SELECT LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) FROM t1 ORDER BY 1;
┌───────────────────────────────────────────────────┐
│ lead(any_value(c2)) OVER (ORDER BY any_value(c3)) │
│                       int32                       │
├───────────────────────────────────────────────────┤
│                       NULL                        │
└───────────────────────────────────────────────────┘

How to repeat:
create table t1 (c1 int, c2 int, c3 int, primary key (c1));
INSERT INTO t1 VALUES (1, 1, 10), (2, 1, 20), (3, 3, 30);

SELECT   LEAD(c2) OVER ( ORDER BY c3) FROM t1 ORDER BY 1;
SELECT   LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) FROM t1 ORDER BY 1;
[18 Dec 9:32] Chaithra Marsur Gopala Reddy
Hi chang wanli,

Thank you for the test case. Verified as described.