Description:
Summary
With subquery_to_derived, derived_merge and derived_condition_pushdown enabled, a query containing a scalar subquery and a NOT IN condition is transformed into a derived table with an anti-join. The outer WHERE clause is pushed down into the materialization step, replacing the original column s.c with the aggregate function any_value(r.c). During compile-time constant folding, the optimizer incorrectly determines that any_value(r.c) is equal to CAST(IFNULL(any_value(r.c), ...) AS DOUBLE), making the filter always false. As a result, the row that should survive is dropped, leading to a count of 0. The identical logic against the base table returns the correct count of 1.
Result
Single query: reads the source table src directly — COUNT(*) = 1 (correct)
Split query: reconstructs the same data from vertically partitioned tables l/r via a scalar subquery and applies the same predicate — COUNT(*) = 0 (incorrect)
Explain Split query (incorrect)
-> Aggregate: count(0)
-> Nested loop inner join
-> Index scan on l using PRIMARY
-> Filter: reject_if((derived_2_3.Name_exp_3 > 1))
-> Index lookup on derived_2_3 using <auto_key0> (id=l.id)
-> Materialize
-> Filter: (any_value(r.c) <> cast(ifnull(any_value(r.c),'-762030035') as double))
-> Group aggregate: count(0)
-> Index scan on r using PRIMARY
Analysis:
subquery_to_derived converts the scalar subquery into a derived table with GROUP BY, wrapping r.c inside any_value().
derived_condition_pushdown pushes the outer WHERE s.c NOT IN (...) into the materialization, substituting s.c with any_value(r.c).
The pushed-down filter becomes any_value(r.c) <> CAST(IFNULL(any_value(r.c), '-762030035') AS DOUBLE).
The FLOAT column c stores the value 790583485 with precision loss (e.g., 790583488). In the direct scan, the approximate FLOAT value does not equal the exact decimal produced by CAST(... AS DOUBLE), so the row passes.
In the pushdown path, the optimizer’s compile-time constant folding erroneously treats any_value(r.c) as identical to CAST(IFNULL(any_value(r.c), ...) AS DOUBLE), making the <> comparison always false. The row is filtered out during materialization.
Root cause
any_value() introduces a different semantic path for type inference and constant folding, and derived_condition_pushdown does not handle this disparity correctly.
How to repeat:
DROP DATABASE IF EXISTS repro;
CREATE DATABASE repro;
USE repro;
SET SESSION optimizer_switch =
'subquery_to_derived=on,derived_merge=on,derived_condition_pushdown=on';
CREATE TABLE src (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
c FLOAT NULL
) ENGINE=InnoDB;
INSERT INTO src (c) VALUES (790583485);
CREATE TABLE l (
id BIGINT NOT NULL PRIMARY KEY
) ENGINE=InnoDB;
CREATE TABLE r (
id BIGINT NOT NULL PRIMARY KEY,
c FLOAT NULL
) ENGINE=InnoDB;
INSERT INTO l (id) SELECT id FROM src;
INSERT INTO r (id, c) SELECT id, c FROM src;
-- Single query: correct, returns 1
SELECT 'single' AS query_type, COUNT(*) AS row_count
FROM src AS s
WHERE s.c NOT IN (+(IFNULL(s.c, '-762030035')));
-- Split query: wrong, returns 0
SELECT 'split' AS query_type, COUNT(*) AS row_count
FROM (
SELECT
l.id,
(SELECT r.c FROM r WHERE r.id = l.id) AS c
FROM l
) AS s
WHERE s.c NOT IN (+(IFNULL(s.c, '-762030035')));
Description: Summary With subquery_to_derived, derived_merge and derived_condition_pushdown enabled, a query containing a scalar subquery and a NOT IN condition is transformed into a derived table with an anti-join. The outer WHERE clause is pushed down into the materialization step, replacing the original column s.c with the aggregate function any_value(r.c). During compile-time constant folding, the optimizer incorrectly determines that any_value(r.c) is equal to CAST(IFNULL(any_value(r.c), ...) AS DOUBLE), making the filter always false. As a result, the row that should survive is dropped, leading to a count of 0. The identical logic against the base table returns the correct count of 1. Result Single query: reads the source table src directly — COUNT(*) = 1 (correct) Split query: reconstructs the same data from vertically partitioned tables l/r via a scalar subquery and applies the same predicate — COUNT(*) = 0 (incorrect) Explain Split query (incorrect) -> Aggregate: count(0) -> Nested loop inner join -> Index scan on l using PRIMARY -> Filter: reject_if((derived_2_3.Name_exp_3 > 1)) -> Index lookup on derived_2_3 using <auto_key0> (id=l.id) -> Materialize -> Filter: (any_value(r.c) <> cast(ifnull(any_value(r.c),'-762030035') as double)) -> Group aggregate: count(0) -> Index scan on r using PRIMARY Analysis: subquery_to_derived converts the scalar subquery into a derived table with GROUP BY, wrapping r.c inside any_value(). derived_condition_pushdown pushes the outer WHERE s.c NOT IN (...) into the materialization, substituting s.c with any_value(r.c). The pushed-down filter becomes any_value(r.c) <> CAST(IFNULL(any_value(r.c), '-762030035') AS DOUBLE). The FLOAT column c stores the value 790583485 with precision loss (e.g., 790583488). In the direct scan, the approximate FLOAT value does not equal the exact decimal produced by CAST(... AS DOUBLE), so the row passes. In the pushdown path, the optimizer’s compile-time constant folding erroneously treats any_value(r.c) as identical to CAST(IFNULL(any_value(r.c), ...) AS DOUBLE), making the <> comparison always false. The row is filtered out during materialization. Root cause any_value() introduces a different semantic path for type inference and constant folding, and derived_condition_pushdown does not handle this disparity correctly. How to repeat: DROP DATABASE IF EXISTS repro; CREATE DATABASE repro; USE repro; SET SESSION optimizer_switch = 'subquery_to_derived=on,derived_merge=on,derived_condition_pushdown=on'; CREATE TABLE src ( id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, c FLOAT NULL ) ENGINE=InnoDB; INSERT INTO src (c) VALUES (790583485); CREATE TABLE l ( id BIGINT NOT NULL PRIMARY KEY ) ENGINE=InnoDB; CREATE TABLE r ( id BIGINT NOT NULL PRIMARY KEY, c FLOAT NULL ) ENGINE=InnoDB; INSERT INTO l (id) SELECT id FROM src; INSERT INTO r (id, c) SELECT id, c FROM src; -- Single query: correct, returns 1 SELECT 'single' AS query_type, COUNT(*) AS row_count FROM src AS s WHERE s.c NOT IN (+(IFNULL(s.c, '-762030035'))); -- Split query: wrong, returns 0 SELECT 'split' AS query_type, COUNT(*) AS row_count FROM ( SELECT l.id, (SELECT r.c FROM r WHERE r.id = l.id) AS c FROM l ) AS s WHERE s.c NOT IN (+(IFNULL(s.c, '-762030035')));