Bug #120032 Inconsistent Aggregation Results in Derived Tables: PRIMARY KEY Affects AVG(NOT(...)) Evaluation and Optimizer Execution
Submitted: 11 Mar 14:56 Modified: 11 Mar 20:56
Reporter: b a Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.4.8 OS:Any
Assigned to: CPU Architecture:Any

[11 Mar 14:56] b a
Description:
Two tables containing identical data produce different query results depending on whether a `PRIMARY KEY` is defined.

When the table contains a `PRIMARY KEY`, the query returns one row. However, when the `PRIMARY KEY` is removed (while keeping the table data identical), the same query returns an empty set.

The issue appears to be related to how the optimizer handles the `AVG(NOT(tom3.c0))` aggregate expression within a correlated subquery in the derived table. The presence of `PRIMARY KEY` on column `c1` appears to enable an optimization path that incorrectly evaluates the `NOT` operator or the `AVG` aggregation, leading to wrong results.

Since the table contents are identical in both cases, the query result should be the same regardless of index structure. The difference suggests that the optimizer may incorrectly evaluate the `AVG(NOT(...))` expression or apply invalid transformations when a `PRIMARY KEY` is present.

How to repeat:
Case 1: Table with `PRIMARY KEY`

```sql
DROP DATABASE IF EXISTS test;
CREATE DATABASE test;
USE test;

CREATE TABLE t0(
  c0 FLOAT,
  c1 DOUBLE PRIMARY KEY
);

INSERT INTO t0(c1) VALUES(-1);
CREATE INDEX i0 ON t0(c0, c1);
INSERT INTO t0(c1, c0) VALUES(-2, 1);

SELECT 1 AS c4
FROM (
    SELECT (
        SELECT AVG(NOT(tom3.c0))
        FROM t0 AS tom3
        GROUP BY tom3.c1
        LIMIT 1
    ) AS c0
) AS tom4
WHERE TIME_FORMAT(tom4.c0, 1);
```

Result:

```
+----+
| c4 |
+----+
|  1 |
+----+
1 row in set
```

Case 2: Table without `PRIMARY KEY` (same data)

```sql
DROP DATABASE IF EXISTS test;
CREATE DATABASE test;
USE test;

CREATE TABLE t0(
  c0 FLOAT,
  c1 DOUBLE
);

INSERT INTO t0(c1) VALUES(-1);
CREATE INDEX i0 ON t0(c0, c1);
INSERT INTO t0(c1, c0) VALUES(-2, 1);

SELECT 1 AS c4
FROM (
    SELECT (
        SELECT AVG(NOT(tom3.c0))
        FROM t0 AS tom3
        GROUP BY tom3.c1
        LIMIT 1
    ) AS c0
) AS tom4
WHERE TIME_FORMAT(tom4.c0, 1);
```

Result:

```
Empty set
```

The two tables contain identical data, so the query result should be consistent.
[11 Mar 20:56] Roy Lyseng
Thank you for the bug report.
Verified as described.