MySQL Bugs: #120145: Incorrect empty result after relocating a HAVING predicate into a derived-table boolean column with WHERE ref1

Bug #120145	Incorrect empty result after relocating a HAVING predicate into a derived-table boolean column with WHERE ref1
Submitted:	25 Mar 6:05	Modified:	25 Mar 19:22
Reporter:	策吕	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	8.0.45	OS:	Linux (ubuntu 20.04)
Assigned to:		CPU Architecture:	x86

Description:
Summary

MySQL returns different results for two logically equivalent queries. The original grouped query returns one row (NULL), while the transformed query—obtained by relocating the HAVING predicate into the projection list of a derived table and filtering with WHERE ref1—returns an empty set. This appears to be a wrong-result bug caused by incorrect evaluation or rewriting of the relocated aggregate predicate.

Actual Result
Query 1 returns one row: NULL

Query 2 returns: Empty Set

Expected Result
Both queries should return the same result (NULL). The second query is only a structural rewriting of the first one: the original HAVING predicate is projected as a derived boolean column ref1, and the outer query filters with WHERE ref1. Since HAVING is the standard way to filter grouped rows by aggregate conditions in MySQL, relocating that condition into a derived table should preserve semantics.

Why this looks wrong
MAX([DISTINCT] expr) is documented to produce the same result with or without DISTINCT, so the aggregate predicate itself is unchanged by this rewriting.

Also, MySQL comparison expressions yield 1, 0, or NULL, and XOR is a defined operator in MySQL expressions, so the predicate in both queries is a normal boolean-valued expression under MySQL semantics.

Therefore, the following two forms should be equivalent:

... GROUP BY g HAVING P

and

SELECT * FROM (
  SELECT ..., P AS ref1
  FROM ...
  GROUP BY g
) AS s
WHERE ref1

However, MySQL returns different results for these two forms, which indicates a wrong-result bug.

Execution Plans

Query 1:

EXPLAIN SELECT DISTINCT CONCAT(t1.c0) FROM t1
WHERE (- (((t1.c1) XOR (0.5609188265100435)) >= ((0.25130445811584556) != (1342228223))))
GROUP BY t1.c0
HAVING (MAX(DISTINCT t1.c1)) != (-1599144071);

Plan:

-> Sort with duplicate removal: `CONCAT(t1.c0)`
    -> Filter: (max(t1.c1) <> <cache>(-(1599144071)))
        -> Table scan on <temporary>
            -> Aggregate using temporary table
                -> Filter: (0 <> -((((0 <> t1.c1) xor <cache>((0 <> 0.5609188265100435))) >= <cache>((0.25130445811584556 <> 1342228223)))))  (cost=0.65 rows=4)
                    -> Covering index scan on t1 using i1  (cost=0.65 rows=4)

Query 2:

EXPLAIN SELECT DISTINCT ref0
FROM (
    SELECT CONCAT(t1.c0) AS ref0,
           ((MAX(DISTINCT t1.c1)) != (-1599144071)) AS ref1
    FROM t1
    WHERE (- (((t1.c1) XOR (0.5609188265100435)) >= ((0.25130445811584556) != (1342228223))))
    GROUP BY t1.c0
) AS s
WHERE ref1;

Plan:

-> Table scan on <temporary>  (cost=5.46..6.72 rows=2)
    -> Temporary table with deduplication  (cost=4.2..4.2 rows=2)
        -> Table scan on s  (cost=2.47..3.74 rows=2)
            -> Materialize  (cost=1.21..1.21 rows=2)
                -> Filter: (0 <> (max(t1.c1) <> <cache>(-(1599144071))))  (cost=0.75 rows=2)
                    -> Filter: (0 <> -((((0 <> t1.c1) xor <cache>((0 <> 0.5609188265100435))) >= <cache>((0.25130445811584556 <> 1342228223)))))  (cost=0.75 rows=2)
                        -> Covering index skip scan for grouping on t1 using i1  (cost=0.75 rows=2)

How to repeat:
CREATE TABLE t1(c0 DECIMAL , c1 FLOAT PRIMARY KEY UNIQUE KEY) ;
INSERT INTO t1(c0, c1) VALUES(NULL, 0.41472381328161134);
REPLACE INTO t1(c0, c1) VALUES(-1.689413262E9, 1037036926);
INSERT IGNORE INTO t1(c0, c1) VALUES(-547152157, 0.8763334995483548);
UPDATE t1 SET c0=DEFAULT;
UPDATE t1 SET c1=NULL WHERE (- (((-482097596) IN (NULL, '', -1312523974)) AND (NULL)));
INSERT IGNORE INTO t1(c0) VALUES(NULL), (NULL), ('-1123800899');

CREATE INDEX i1 USING BTREE ON t1(c0, c1 DESC) ALGORITHM INPLACE;
-- cardinality: 1: NULL
SELECT DISTINCT CONCAT(t1.c0) FROM t1 WHERE (- (((t1.c1) XOR (0.5609188265100435)) >= ((0.25130445811584556) != (1342228223)))) GROUP BY t1.c0 HAVING (MAX(DISTINCT t1.c1)) != (-1599144071);
-- cardinality: 0
SELECT DISTINCT ref0 FROM (SELECT CONCAT(t1.c0) AS ref0, ((MAX(DISTINCT t1.c1)) != (-1599144071)) AS ref1 FROM t1 WHERE (- (((t1.c1) XOR (0.5609188265100435)) >= ((0.25130445811584556) != (1342228223)))) GROUP BY t1.c0) AS s WHERE ref1;

Suggested fix:
Possible Cause
The issue seems related to optimizer rewriting of the relocated aggregate predicate after derived-table materialization. In the original query, the aggregate predicate remains a post-group filter. In the rewritten query, WHERE ref1 is transformed into a truth test over the derived boolean expression, and this transformation appears to change the result incorrectly. MySQL has had optimizer issues around XOR-related rewrites before, so this may be another wrong-result case triggered by expression simplification plus grouping/index access.

Thank you for the bug report.
Verified as described.