Description:
When using ANY_VALUE within a window function, it may return incorrect results. See:
SQL1: SELECT LEAD(c2) OVER ( ORDER BY c3) FROM t1 ORDER BY 1;
SQL2: SELECT LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) FROM t1 ORDER BY 1;
mysql> SELECT LEAD(c2) OVER ( ORDER BY c3) FROM t1 ORDER BY 1;
+------------------------------+
| LEAD(c2) OVER ( ORDER BY c3) |
+------------------------------+
| NULL |
| 1 |
| 3 |
+------------------------------+
3 rows in set (0.00 sec)
mysql> SELECT LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) FROM t1 ORDER BY 1;
+----------------------------------------------------+
| LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) |
+----------------------------------------------------+
| NULL |
| 1 |
| 3 |
+----------------------------------------------------+
3 rows in set (0.00 sec)
There is no PARTITION BY or GROUP BY clause in the query SQL2, so the entire table is treated as a single partition. Due to the presence of ANY_VALUE, the query should return only one row of data.
Since no explicit definition is provided for ANY_VALUE in the context of both aggregate functions (https://dev.mysql.com/doc/refman/8.4/en/miscellaneous-functions.html#function_any-value) and window functions (https://dev.mysql.com/doc/refman/8.4/en/window-function-descriptions.html), it is unclear whether this behavior is as expected.
We have also tested this on other databases, such as Oracle and DuckDB, and they all returned only one row of data.
Oracle:
> SELECT LEAD(c2) OVER ( ORDER BY c3) FROM t1 ORDER BY 1;
+------------------------------+
| LEAD(c2) OVER ( ORDER BY c3) |
+------------------------------+
| NULL |
| 1 |
| 3 |
+------------------------------+
> SELECT LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) FROM t1 ORDER BY 1;
+----------------------------------------------------+
| LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) |
+----------------------------------------------------+
| NULL |
+----------------------------------------------------+
DuckDB:
D SELECT LEAD(c2) OVER ( ORDER BY c3) FROM t1 ORDER BY 1;
┌─────────────────────────────┐
│ lead(c2) OVER (ORDER BY c3) │
│ int32 │
├─────────────────────────────┤
│ 1 │
│ 3 │
│ NULL │
└─────────────────────────────┘
D SELECT LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) FROM t1 ORDER BY 1;
┌───────────────────────────────────────────────────┐
│ lead(any_value(c2)) OVER (ORDER BY any_value(c3)) │
│ int32 │
├───────────────────────────────────────────────────┤
│ NULL │
└───────────────────────────────────────────────────┘
How to repeat:
create table t1 (c1 int, c2 int, c3 int, primary key (c1));
INSERT INTO t1 VALUES (1, 1, 10), (2, 1, 20), (3, 3, 30);
SELECT LEAD(c2) OVER ( ORDER BY c3) FROM t1 ORDER BY 1;
SELECT LEAD(ANY_VALUE(c2)) OVER ( ORDER BY ANY_VALUE(c3)) FROM t1 ORDER BY 1;