Bug #117013 MAX() as a Window Function Blindly Returns NULL for the First Row under Specific Conditions
Submitted: 18 Dec 2024 14:19 Modified: 20 Mar 15:14
Reporter: Team QueryHouse Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: Logic bug, Optimizer bug, Window functions

[18 Dec 2024 14:19] Team QueryHouse
Description:
Hi,

When using the max() as a window function, an unexpected behavior occurs under certain conditions.

The MAX() function returns NULL for the first row within the window frame, despite data existing in the first row.

This bug triggers when the following conditions are satisfied:

1. The max() function is used as a window function (e.g., MAX(column) OVER ( ... )).
2. The window frame’s start is defined using N FOLLOWING (e.g., ROWS BETWEEN 1 FOLLOWING AND …).
3. The window frame is ordered by the same expression with the argument of max(), differing only in syntax (e.g., aliases or table references).
4. The window frame is ordered in descending order. (e.g., ORDER BY column DESC).

How to repeat:
**Minimized Query to Reproduce**:

```sql
CREATE TABLE t0 (c0 INT);
INSERT INTO t0 VALUES (1), (2);
SELECT c0, MAX(C0) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)
FROM t0;
```

**Expected Result**:

```
mysql> SELECT c0, MAX(c0+0) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) FROM t0;
+------+----------------------------------------------------------------------------+
| c0   | MAX(c0+0) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) |
+------+----------------------------------------------------------------------------+
|    2 |                                                                          1 |
|    1 |                                                                       NULL |
+------+----------------------------------------------------------------------------+
2 rows in set (0.00 sec)
```

The window frame for row 2 consists of 1, so MAX() returns 1. (Note c0 + 0 to force normal results.)

**Actual result** (MySQL, version 8.4 and 8.0.32):

```
mysql> SELECT c0, MAX(C0) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) FROM t0;
+------+--------------------------------------------------------------------------+
| c0   | MIN(C0) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) |
+------+--------------------------------------------------------------------------+
|    2 |                                                                     NULL |
|    1 |                                                                     NULL |
+------+--------------------------------------------------------------------------+
2 rows in set (0.00 sec)
```

However, NULL is returned instead of 1.

Suggested fix:
We believe this bug is likely caused by the optimization logic handling the above conditions.

We suggest reviewing the MySQL code related to the MAX() function as a window function to ensure it returns the correct results.

Thank you for your time and attention to this matter. We look forward to your response.

Best regards,
[18 Dec 2024 14:50] MySQL Verification Team
Hi Team QueryHouse,

Thank you for your bug report.

We get very consistent results with latest 8.0, 8.4 and 9.1.

Regardless if we use MAX(c0) or MAX(c0 + 0) we always get NULL in the second column, with latest releases of all supported versions. We also consider that this is a correct result, because that is how your table and query are designed.

We see no reason why should the results be otherwise, unless you come up with a very compelling explanation of why is that result wrong.

Not a bug.
[19 Dec 2024 4:19] Team QueryHouse
Thank you for your response. However, the actual result is indeed wrong, as the second column of the first row should **NOT** return `NULL`.

> Regardless if we use MAX(c0) or MAX(c0 + 0) we **always get NULL** in the second column, with latest releases of all supported versions.

`MAX()` returning `NULL` despite having one or more rows to evaluate is evidently a bug.

The expected result for the first row (with `c0` of value `2`) is `1`, not `NULL`, since its window frame consists of one row, with the value `1`.

To clarify, by using `c0 + 0` to force the correct result, we meant `MAX(c0 + 0)` for the `MAX()` function, while keeping `c0` for the `ORDER BY` clause. This ensures the issue is not reproduced and returns the correct results, as it avoids satisfying the third condition:

> 3. The window frame is ordered by the same expression with the argument of max(), differing only in syntax (e.g., aliases or table references).

For example, `MAX(c0 + 0) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)` returns the correct result `1`, whereas `MAX(c0 + 0) OVER (ORDER BY c0 + 0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)` does **not**, satisfying all 4 conditions we've outlined to reproduce the issue.

You can verify this in other ways, e.g., by checking the number of rows within window frames for each row:
```sql
mysql> SELECT c0, COUNT(*) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS cnt FROM
t0;
+------+-----+
| c0   | cnt |
+------+-----+
|    2 |   1 |
|    1 |   0 |
+------+-----+
2 rows in set (0.00 sec)
```

Or by using `MIN()` instead of `MAX()` (only in this case, since there is only one row is in the frame):
```sql
mysql> SELECT c0, MIN(c0) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS min FROM t
0;
+------+------+
| c0   | min  |
+------+------+
|    2 |    1 |
|    1 | NULL |
+------+------+
2 rows in set (0.00 sec)
```

Or by using other window functions to confirm the expected behavior:
```sql
mysql> SELECT c0, AVG(c0) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS avg FROM t
0;
+------+--------+
| c0   | avg    |
+------+--------+
|    2 | 1.0000 |
|    1 |   NULL |
+------+--------+
2 rows in set (0.01 sec)

mysql> SELECT c0, SUM(c0) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS sum FROM t
0;
+------+------+
| c0   | sum  |
+------+------+
|    2 |    1 |
|    1 | NULL |
+------+------+
2 rows in set (0.00 sec)

mysql> SELECT c0, FIRST_VALUE(c0) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS firstvalue FROM t0;
+------+------------+
| c0   | firstvalue |
+------+------------+
|    2 |          1 |
|    1 |       NULL |
+------+------------+
2 rows in set (0.00 sec)

mysql> SELECT c0, LAST_VALUE(c0) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS lastvalue FROM t0;
+------+-----------+
| c0   | lastvalue |
+------+-----------+
|    2 |         1 |
|    1 |      NULL |
+------+-----------+
2 rows in set (0.00 sec)

mysql> SELECT c0, NTH_VALUE(c0,1) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS nth FROM t0;
+------+------+
| c0   | nth  |
+------+------+
|    2 |    1 |
|    1 | NULL |
+------+------+
2 rows in set (0.00 sec)
```

Please investigate this issue. We look forward to your response.

Best regards,
[10 Jan 11:13] MySQL Verification Team
Hi Team QueryHouse,

We have re-run your query and it is truly a bug.

Verified as reported.

Thank you for your contribution.
[20 Mar 15:14] Jon Stephens
Documented fix as follows in the MySQL 9.3.0 changelog:

    When using MAX() as a window function, it returned NULL for the
    first row within the window frame, despite data existing in the
    first row. This happened when the start of the window frame was
    defined using N FOLLOWING, and the frame was ordered by the same
    expression as the argument of MAX(), possibly differing only in
    syntax such as aliases or table references, in descending order.
    A sequence of statements demonstrating the issue is shown here:

CREATE TABLE t0 (c0 INT);

INSERT INTO t0 VALUES (1), (2);

SELECT 
c0, MAX(c0) OVER (ORDER BY c0 DESC ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)
FROM t0;

    We fix this by making sure that the first row number in the
    frame is set in the appropriate place in the program logic.

Closed.