| Bug #120642 | Derived table merging loses FLOAT display formatting context, causing inconsistent aggregated output | ||
|---|---|---|---|
| Submitted: | 9 Jun 10:44 | Modified: | 10 Jun 5:11 |
| Reporter: | Annie liu | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S1 (Critical) |
| Version: | MySQL 9.6 | OS: | Any (Storage Engine: InnoDB) |
| Assigned to: | CPU Architecture: | Any | |
[9 Jun 11:27]
Roy Lyseng
Thank you for the bug report. However, this is not a bug. When materializing the derived table, the FLOAT value is converted to DOUBLE, which causes the value to become slightly different. But small deviations must be expected when operating on floating point values. The queries described in Case 2 are incomplete and have not been considered.
[10 Jun 5:11]
Annie liu
Thank you for your reply. I sincerely apologize for not providing a complete reproduction for Case 2; I will add it at the end.
While minor differences in numerical calculations when handling FLOAT/DOUBLE in MySQL (such as 1.00000001 vs 1.0) are indeed within the tolerance range for floating-point operations, we want to emphasize the difference in display format. Users using two logically equivalent queries (one from the base table, and the other from a derived table rebuilt via a vertical partitioning JOIN) to retrieve the exact same data receive different text representations. Furthermore, the difference reaches the millions (-2121630000 vs -2121634176), far exceeding the scope of "rounding error."
We think the problem is that the optimizer/executor loses column display metadata, causing the same value to be “printed” differently.
Case 1: MIN(c0) GROUP BY display difference
-- Control query c0 column displays: -2121630000, CAST(c0 AS SIGNED) displays: -2121634176
SELECT c0, CAST(c0 AS SIGNED) FROM s;
-- Direct base table scan: MIN(c0) displays -2121630000
SELECT MIN(c0) FROM s GROUP BY CAST(c0 AS SIGNED);
-- Derived table via vertical partition JOIN: MIN(c0) displays -2121634176
SELECT MIN(c0) FROM (
SELECT r.c0 FROM l JOIN r ON l.vp_rowid = r.vp_rowid
) AS x GROUP BY CAST(c0 AS SIGNED);
Case 2: MAX(DISTINCT c0) display difference
-- Control query c0 displays 1474950000, c0+0 and CAST return 1474948480
SELECT c0, c0+0, CAST(c0 AS DECIMAL(65,30)) FROM src;
-- Base table: MAX(DISTINCT c0) displays 1474950000
SELECT MAX(DISTINCT c0) FROM src GROUP BY (((src.vp_rowid) IS NOT FALSE) AND (('r0䋴popZ]') XOR (src.c0))) IS NOT FALSE;;
-- Complex JOIN derived table: MAX(DISTINCT c0) displays 1474948480
SELECT MAX(DISTINCT c0) FROM (
SELECT l.vp_rowid, r.c0
FROM l
JOIN (
SELECT vp_rowid, CONCAT('vp:', vp_rowid) AS payload
FROM r
) AS sp
ON CONCAT('vp:', l.vp_rowid) = sp.payload
JOIN r
ON r.vp_rowid = sp.vp_rowid
) AS s GROUP BY (((s.vp_rowid) IS NOT FALSE) AND (('r0䋴popZ]') XOR (s.c0))) IS NOT FALSE;
In both cases the binary representation never changes. The difference occurs solely at result set output: a base-table column reference like s.c0 carries field metadata that triggers MySQL’s FLOAT display formatting (~7 significant digits). When the same value passes through a derived table or complex join, the column reference becomes a generic Item and the display metadata is lost.
Why we consider this a bug?
1. Logically equivalent queries, when the underlying data has not changed, should return the same result set. But the string representation of the same column in the same row differs in this two cases.
2. After derived table merging, the column's original Field type (including display width, precision, etc.) is lost or replaced by a generic Item. This is a failure of the optimizer/executor to correctly preserve column attributes.
3. The user has not used any explicit conversions like CAST or FORMAT, nor have they changed sql_mode, yet the output changes. This means users cannot reliably obtain consistent results by rewriting queries, unless they forcibly add something like CAST(c0 AS DECIMAL(…)) to bypass the display formatting—which is clearly not a reasonable workaround.
How to repeat:
Case 1:
DROP DATABASE IF EXISTS vp_db12_min;
CREATE DATABASE vp_db12_min;
USE vp_db12_min;
CREATE TABLE s (
vp_rowid BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
c0 FLOAT NULL,
c1 FLOAT NULL
) ENGINE=InnoDB;
INSERT INTO s(c0, c1) VALUES (-2121634170, NULL);
CREATE TABLE l (
vp_rowid BIGINT NOT NULL PRIMARY KEY,
c1 FLOAT NULL
) ENGINE=InnoDB;
CREATE TABLE r (
vp_rowid BIGINT NOT NULL PRIMARY KEY,
c0 FLOAT NULL
) ENGINE=InnoDB;
INSERT INTO l SELECT vp_rowid, c1 FROM s;
INSERT INTO r SELECT vp_rowid, c0 FROM s;
-- Control query: observe FLOAT display formatting vs. exact value
SELECT c0, CAST(c0 AS SIGNED) AS signed_c0 FROM s;
-- Returns: c0 = -2121630000 (display), signed_c0 = -2121634176 (exact binary)
-- Base table query
SELECT 'base' AS src, MIN(c0) AS ref
FROM s
GROUP BY CAST(c0 AS SIGNED);
-- Returns: ref = -2121630000
-- Derived table query (vertical partition join)
SELECT 'vp_split' AS src, MIN(c0) AS ref
FROM (
SELECT l.vp_rowid, r.c0, l.c1
FROM l
JOIN r ON l.vp_rowid = r.vp_rowid
) AS x
GROUP BY CAST(c0 AS SIGNED);
-- Returns: ref = -2121634176
Case 2:
DROP DATABASE IF EXISTS vp_min_float;
CREATE DATABASE vp_min_float;
USE vp_min_float;
CREATE TABLE src (
vp_rowid BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
c0 FLOAT NULL
) ENGINE=InnoDB;
INSERT INTO src(c0) VALUES (1474948533);
CREATE TABLE l (
vp_rowid BIGINT NOT NULL PRIMARY KEY
) ENGINE=InnoDB;
CREATE TABLE r (
vp_rowid BIGINT NOT NULL PRIMARY KEY,
c0 FLOAT NULL
) ENGINE=InnoDB;
INSERT INTO l SELECT vp_rowid FROM src;
INSERT INTO r SELECT vp_rowid, c0 FROM src;
-- Control query
SELECT
c0,
c0 + 0 AS c0_numeric_context,
CAST(c0 AS DECIMAL(65,30)) AS c0_decimal
FROM src;
-- Returns: c0 = 1474950000, c0_numeric_context = 1474948480,c0_decimal = 1474948480
-- Base table query
SELECT
'base_group' AS q,
SUM(s.c0) AS sum_c0,
MAX(DISTINCT s.c0) AS max_distinct_c0,
CAST(MAX(DISTINCT s.c0) AS DECIMAL(65,30)) AS max_distinct_c0_decimal,
(((s.vp_rowid) IS NOT FALSE) AND (('r0䋴popZ]') XOR (s.c0))) IS NOT FALSE AS g
FROM src AS s
GROUP BY (((s.vp_rowid) IS NOT FALSE) AND (('r0䋴popZ]') XOR (s.c0))) IS NOT FALSE;
-- Returns: max_distinct_c0 = 1474950000, max_distinct_c0_decimal = 1474948480
-- Derived table query (complex join reconstruction)
SELECT
'derived_group' AS q,
SUM(s.c0) AS sum_c0,
MAX(DISTINCT s.c0) AS max_distinct_c0,
CAST(MAX(DISTINCT s.c0) AS DECIMAL(65,30)) AS max_distinct_c0_decimal,
(((s.vp_rowid) IS NOT FALSE) AND (('r0䋴popZ]') XOR (s.c0))) IS NOT FALSE AS g
FROM (
SELECT l.vp_rowid, r.c0
FROM l
JOIN (
SELECT vp_rowid, CONCAT('vp:', vp_rowid) AS payload
FROM r
) AS sp
ON CONCAT('vp:', l.vp_rowid) = sp.payload
JOIN r
ON r.vp_rowid = sp.vp_rowid
) AS s
GROUP BY (((s.vp_rowid) IS NOT FALSE) AND (('r0䋴popZ]') XOR (s.c0))) IS NOT FALSE;
-- Returns: max_distinct_c0 = 1474948480, max_distinct_c0_decimal = 1474948480
Thank you for your time. I look forward to further discussion.
[10 Jun 7:44]
Roy Lyseng
Thank you for replying. You say that minor differences in numerical calculations when handling FLOAT/DOUBLE are within the tolerance range for floating-point operations, and that is indeed what is happening here. The difference is in the 7th digit and later, which coincides with the precision for FLOAT data. It would be more visible with an exponential display format, but (unfortunately), this is not the default display format in MySQL. A workaround for the apparent problem would be to use DECIMAL (if the range is acceptable), or DOUBLE, which is less prone to automatic type conversions.

Description: Summary When a query reconstructs a base table through JOINs or derived tables, the optimizer’s merging of derived tables causes the original FLOAT column to lose its display formatting metadata. Direct scans apply rounding for display (~7 significant digits), while the derived-table path exposes the raw IEEE 754 single-precision binary representable value. This leads to inconsistent results from aggregation functions like MIN, MAX, SUM, and GROUP BY. The issue is reproduced in two variants: simple MIN(c0) and complex MAX(DISTINCT c0). Actual results (Case 1): Control: SELECT c0, CAST(c0 AS SIGNED) AS signed_c0 FROM s; c0: -2121630000 (display value) signed_c0: -2121634176 (exact binary representable value) base MIN(c0): -2121630000 vp_split MIN(c0): -2121634176 Expected results The display format of aggregated FLOAT values should be consistent regardless of whether a derived table is used. EXPLAIN Analysis Case 1 — Baseline -> Table scan on <temporary> -> Aggregate using temporary table -> Table scan on s Case 1 — VP Split -> Table scan on <temporary> -> Aggregate using temporary table -> Nested loop inner join -> Covering index scan on l using PRIMARY -> Single-row index lookup on r using PRIMARY (vp_rowid = l.vp_rowid) The two plans are topologically almost identical (both use temporary-table aggregation). The only difference is whether the data source is a direct table or a derived column from a JOIN. This confirms the bug is not in the execution plan topology, but in the column type/formatting metadata. Root cause: Direct table scans preserve the column’s field metadata, so MySQL applies FLOAT display formatting (rounding to ~7 significant digits) when outputting results. When the value flows through a derived table or complex join, the column reference r.c0 becomes a general expression result. The optimizer may lose or overwrite the display attributes, causing the output layer to skip formatting and return the raw binary representable value. Essentially, derived table merging fails to propagate the Field display properties, leading to semantic inconsistency in aggregated results. How to repeat: Case 1: MIN aggregate DROP DATABASE IF EXISTS vp_db12_min; CREATE DATABASE vp_db12_min; USE vp_db12_min; CREATE TABLE s ( vp_rowid BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, c0 FLOAT NULL, c1 FLOAT NULL ) ENGINE=InnoDB; INSERT INTO s(c0, c1) VALUES (-2121634170, NULL); CREATE TABLE l ( vp_rowid BIGINT NOT NULL PRIMARY KEY, c1 FLOAT NULL ) ENGINE=InnoDB; CREATE TABLE r ( vp_rowid BIGINT NOT NULL PRIMARY KEY, c0 FLOAT NULL ) ENGINE=InnoDB; INSERT INTO l SELECT vp_rowid, c1 FROM s; INSERT INTO r SELECT vp_rowid, c0 FROM s; -- Direct base table: displays -2121630000 SELECT 'base' AS src, MIN(c0) AS ref FROM s GROUP BY CAST(c0 AS SIGNED); -- Derived table reconstruction: wrongly displays -2121634176 SELECT 'vp_split' AS src, MIN(c0) AS ref FROM ( SELECT l.vp_rowid, r.c0, l.c1 FROM l JOIN r ON l.vp_rowid = r.vp_rowid ) AS x GROUP BY CAST(c0 AS SIGNED); Case 2: MAX(DISTINCT) aggregate DROP DATABASE IF EXISTS vp_min_float; CREATE DATABASE vp_min_float; USE vp_min_float; CREATE TABLE src ( vp_rowid BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, c0 FLOAT NULL ) ENGINE=InnoDB; INSERT INTO src(c0) VALUES (1474948533); CREATE TABLE l ( vp_rowid BIGINT NOT NULL PRIMARY KEY ) ENGINE=InnoDB; CREATE TABLE r ( vp_rowid BIGINT NOT NULL PRIMARY KEY, c0 FLOAT NULL ) ENGINE=InnoDB; INSERT INTO l SELECT vp_rowid FROM src; INSERT INTO r SELECT vp_rowid, c0 FROM src; -- Base table: MAX(DISTINCT c0) displays 1474950000 SELECT 'base_group' AS q, MAX(DISTINCT s.c0) AS max_distinct_c0 FROM src AS s GROUP BY ...; -- Derived table: wrongly displays 1474948480 SELECT 'derived_group' AS q, MAX(DISTINCT s.c0) AS max_distinct_c0 FROM ( SELECT l.vp_rowid, r.c0 FROM l JOIN (...) AS sp ON ... JOIN r ON ... ) AS s GROUP BY ...;