Bug #120641 Derived condition pushdown causes wrong row filtering due to incorrect constant folding with any_value() aggregation
Submitted: 9 Jun 10:23 Modified: 9 Jun 10:50
Reporter: Annie liu Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:9.7.0 OS:Any (Storage Engine: InnoDB)
Assigned to: CPU Architecture:Any

[9 Jun 10:23] Annie liu
Description:
Summary

With subquery_to_derived, derived_merge and derived_condition_pushdown enabled, a query containing a scalar subquery and a NOT IN condition is transformed into a derived table with an anti-join. The outer WHERE clause is pushed down into the materialization step, replacing the original column s.c with the aggregate function any_value(r.c). During compile-time constant folding, the optimizer incorrectly determines that any_value(r.c) is equal to CAST(IFNULL(any_value(r.c), ...) AS DOUBLE), making the filter always false. As a result, the row that should survive is dropped, leading to a count of 0. The identical logic against the base table returns the correct count of 1.

Result

Single query: reads the source table src directly — COUNT(*) = 1 (correct)
Split query: reconstructs the same data from vertically partitioned tables l/r via a scalar subquery and applies the same predicate — COUNT(*) = 0 (incorrect)

Explain Split query (incorrect)

-> Aggregate: count(0)
    -> Nested loop inner join
        -> Index scan on l using PRIMARY
        -> Filter: reject_if((derived_2_3.Name_exp_3 > 1))
            -> Index lookup on derived_2_3 using <auto_key0> (id=l.id)
                -> Materialize
                    -> Filter: (any_value(r.c) <> cast(ifnull(any_value(r.c),'-762030035') as double))
                        -> Group aggregate: count(0)
                            -> Index scan on r using PRIMARY

Analysis:

subquery_to_derived converts the scalar subquery into a derived table with GROUP BY, wrapping r.c inside any_value().

derived_condition_pushdown pushes the outer WHERE s.c NOT IN (...) into the materialization, substituting s.c with any_value(r.c).

The pushed-down filter becomes any_value(r.c) <> CAST(IFNULL(any_value(r.c), '-762030035') AS DOUBLE).

The FLOAT column c stores the value 790583485 with precision loss (e.g., 790583488). In the direct scan, the approximate FLOAT value does not equal the exact decimal produced by CAST(... AS DOUBLE), so the row passes.

In the pushdown path, the optimizer’s compile-time constant folding erroneously treats any_value(r.c) as identical to CAST(IFNULL(any_value(r.c), ...) AS DOUBLE), making the <> comparison always false. The row is filtered out during materialization.

Root cause

any_value() introduces a different semantic path for type inference and constant folding, and derived_condition_pushdown does not handle this disparity correctly.

How to repeat:
DROP DATABASE IF EXISTS repro;
CREATE DATABASE repro;
USE repro;

SET SESSION optimizer_switch =
  'subquery_to_derived=on,derived_merge=on,derived_condition_pushdown=on';

CREATE TABLE src (
  id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  c FLOAT NULL
) ENGINE=InnoDB;
INSERT INTO src (c) VALUES (790583485);

CREATE TABLE l (
  id BIGINT NOT NULL PRIMARY KEY
) ENGINE=InnoDB;
CREATE TABLE r (
  id BIGINT NOT NULL PRIMARY KEY,
  c FLOAT NULL
) ENGINE=InnoDB;

INSERT INTO l (id) SELECT id FROM src;
INSERT INTO r (id, c) SELECT id, c FROM src;

-- Single query: correct, returns 1
SELECT 'single' AS query_type, COUNT(*) AS row_count
FROM src AS s
WHERE s.c NOT IN (+(IFNULL(s.c, '-762030035')));

-- Split query: wrong, returns 0
SELECT 'split' AS query_type, COUNT(*) AS row_count
FROM (
  SELECT
    l.id,
    (SELECT r.c FROM r WHERE r.id = l.id) AS c
  FROM l
) AS s
WHERE s.c NOT IN (+(IFNULL(s.c, '-762030035')));
[9 Jun 10:50] Roy Lyseng
Thank you for the bug report.
Verified as described.