Description:
When a query is rewritten using ROW_NUMBER() window functions into a CTE derived table, and the outer query uses both DISTINCTROW and GROUP BY, the optimizer generates a duplicate‑removal key (Sort with duplicate removal) that includes only some SELECT columns (ref0 and ref2), completely omitting another non‑aggregate column ref1.
Since ref1 evaluates to NULL in one row and 0 in another (not equal in SQL), both rows should be kept. However, the duplicate removal, not comparing ref1, incorrectly treats them as duplicates and keeps only one.
The original single‑table query returns two rows, while the ROW_NUMBER‑rewritten version returns only one, indicating the issue is specific to the window‑function / CTE materialization path.
Actual vs Expected Results
Query Type Result Comparison (Actual → Expected)
Original single table 2 rows: (1, NULL, 1), (1, 0, 1) → Correct
ROW_NUMBER rewrite 1 row: (1, NULL, 1) → Incorrect (missing (1, 0, 1))
EXPLAIN Analysis (Derived Table Version – Key Part)
Sort: ref0, ((ll.id = ll.c3) and (NULL = 691925563))
Sort with duplicate removal: ref0, ref2
Stream results
Group aggregate: count(distinct '伴ꪎ鲫K')
Nested loop inner join
Materialize CTE ll
Window aggregate: row_number() OVER (ORDER BY l.id)
Materialize CTE rr
Window aggregate: row_number() OVER (ORDER BY r.id)
Key observation:
Sort with duplicate removal: ref0, ref2 explicitly shows that duplicate removal is performed only on ref0 and ref2, without listing ref1.
Although GROUP BY includes the full expression for ref1, the removal step ignores it.
Unlike the other case (where the key had an extra aggregate column), here the key is missing a non‑aggregate column.
How to repeat:
DROP DATABASE IF EXISTS repro_m616_31_direct;
CREATE DATABASE repro_m616_31_direct;
USE repro_m616_31_direct;
CREATE TABLE src(
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
c0 DECIMAL(65,30), c1 TEXT, c2 FLOAT, c3 DECIMAL(65,30),
c4 FLOAT, c5 DECIMAL(65,30), c6 DECIMAL(65,30), c7 BIGINT
) ENGINE=InnoDB;
INSERT INTO src(c0,c1,c2,c3,c4,c5,c6,c7) VALUES
(0,NULL,-1,2092634463,NULL,NULL,NULL,NULL),
(1299791121,NULL,NULL,NULL,0.786613,NULL,NULL,1);
CREATE TABLE l(
id BIGINT NOT NULL PRIMARY KEY,
c2 FLOAT, c3 DECIMAL(65,30), c6 DECIMAL(65,30), c4 FLOAT
) ENGINE=InnoDB;
CREATE TABLE r(
id BIGINT NOT NULL PRIMARY KEY,
c5 DECIMAL(65,30), c1 TEXT, c0 DECIMAL(65,30), c7 BIGINT, c6 DECIMAL(65,30)
) ENGINE=InnoDB;
INSERT INTO l SELECT id,c2,c3,c6,c4 FROM src;
INSERT INTO r SELECT id,c5,c1,c0,c7,c6 FROM src;
-- Original query (single table): returns 2 rows (correct)
SELECT DISTINCTROW
(NOT ((src.c2) IS FALSE)) AS ref0,
(((+ (src.id))) IN (src.c3)) AND ((+ ((NULL) IN (691925563)))) AS ref1,
COUNT(DISTINCT '伴ꪎ鲫K') AS ref2
FROM src
WHERE ((NOT ((CASE 476522779
WHEN 0.6412434890036196 THEN 0.09819858593173747
WHEN src.id THEN src.c6
WHEN -200221192 THEN src.id
ELSE 0.5051427622194795
END)))) <= (((! ('a'))) >= (CAST(633536363 AS SIGNED)))
GROUP BY
(! ((src.c2) IS FALSE)),
(((+ (src.id))) IN (src.c3)) AND ((+ ((NULL) IN (691925563))))
ORDER BY 'I' DESC;
-- ROW_NUMBER derived table rewrite: returns only 1 row (incorrect)
SELECT DISTINCTROW
(NOT ((s.c2) IS FALSE)) AS ref0,
(((+ (s.id))) IN (s.c3)) AND ((+ ((NULL) IN (691925563)))) AS ref1,
COUNT(DISTINCT '伴ꪎ鲫K') AS ref2
FROM (
WITH
ll AS (
SELECT ROW_NUMBER() OVER (ORDER BY id) AS rn, id, c2,c3,c6,c4 FROM l
),
rr AS (
SELECT ROW_NUMBER() OVER (ORDER BY id) AS rn, id, c5,c1,c0,c7,c6 FROM r
)
SELECT ll.id, rr.c0,rr.c1,ll.c2,ll.c3,ll.c4,rr.c5,ll.c6,rr.c7
FROM ll JOIN rr ON ll.rn=rr.rn
) s
WHERE ((NOT ((CASE 476522779
WHEN 0.6412434890036196 THEN 0.09819858593173747
WHEN s.id THEN s.c6
WHEN -200221192 THEN s.id
ELSE 0.5051427622194795
END)))) <= (((! ('a'))) >= (CAST(633536363 AS SIGNED)))
GROUP BY
(! ((s.c2) IS FALSE)),
(((+ (s.id))) IN (s.c3)) AND ((+ ((NULL) IN (691925563))))
ORDER BY 'I' DESC;
Description: When a query is rewritten using ROW_NUMBER() window functions into a CTE derived table, and the outer query uses both DISTINCTROW and GROUP BY, the optimizer generates a duplicate‑removal key (Sort with duplicate removal) that includes only some SELECT columns (ref0 and ref2), completely omitting another non‑aggregate column ref1. Since ref1 evaluates to NULL in one row and 0 in another (not equal in SQL), both rows should be kept. However, the duplicate removal, not comparing ref1, incorrectly treats them as duplicates and keeps only one. The original single‑table query returns two rows, while the ROW_NUMBER‑rewritten version returns only one, indicating the issue is specific to the window‑function / CTE materialization path. Actual vs Expected Results Query Type Result Comparison (Actual → Expected) Original single table 2 rows: (1, NULL, 1), (1, 0, 1) → Correct ROW_NUMBER rewrite 1 row: (1, NULL, 1) → Incorrect (missing (1, 0, 1)) EXPLAIN Analysis (Derived Table Version – Key Part) Sort: ref0, ((ll.id = ll.c3) and (NULL = 691925563)) Sort with duplicate removal: ref0, ref2 Stream results Group aggregate: count(distinct '伴ꪎ鲫K') Nested loop inner join Materialize CTE ll Window aggregate: row_number() OVER (ORDER BY l.id) Materialize CTE rr Window aggregate: row_number() OVER (ORDER BY r.id) Key observation: Sort with duplicate removal: ref0, ref2 explicitly shows that duplicate removal is performed only on ref0 and ref2, without listing ref1. Although GROUP BY includes the full expression for ref1, the removal step ignores it. Unlike the other case (where the key had an extra aggregate column), here the key is missing a non‑aggregate column. How to repeat: DROP DATABASE IF EXISTS repro_m616_31_direct; CREATE DATABASE repro_m616_31_direct; USE repro_m616_31_direct; CREATE TABLE src( id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, c0 DECIMAL(65,30), c1 TEXT, c2 FLOAT, c3 DECIMAL(65,30), c4 FLOAT, c5 DECIMAL(65,30), c6 DECIMAL(65,30), c7 BIGINT ) ENGINE=InnoDB; INSERT INTO src(c0,c1,c2,c3,c4,c5,c6,c7) VALUES (0,NULL,-1,2092634463,NULL,NULL,NULL,NULL), (1299791121,NULL,NULL,NULL,0.786613,NULL,NULL,1); CREATE TABLE l( id BIGINT NOT NULL PRIMARY KEY, c2 FLOAT, c3 DECIMAL(65,30), c6 DECIMAL(65,30), c4 FLOAT ) ENGINE=InnoDB; CREATE TABLE r( id BIGINT NOT NULL PRIMARY KEY, c5 DECIMAL(65,30), c1 TEXT, c0 DECIMAL(65,30), c7 BIGINT, c6 DECIMAL(65,30) ) ENGINE=InnoDB; INSERT INTO l SELECT id,c2,c3,c6,c4 FROM src; INSERT INTO r SELECT id,c5,c1,c0,c7,c6 FROM src; -- Original query (single table): returns 2 rows (correct) SELECT DISTINCTROW (NOT ((src.c2) IS FALSE)) AS ref0, (((+ (src.id))) IN (src.c3)) AND ((+ ((NULL) IN (691925563)))) AS ref1, COUNT(DISTINCT '伴ꪎ鲫K') AS ref2 FROM src WHERE ((NOT ((CASE 476522779 WHEN 0.6412434890036196 THEN 0.09819858593173747 WHEN src.id THEN src.c6 WHEN -200221192 THEN src.id ELSE 0.5051427622194795 END)))) <= (((! ('a'))) >= (CAST(633536363 AS SIGNED))) GROUP BY (! ((src.c2) IS FALSE)), (((+ (src.id))) IN (src.c3)) AND ((+ ((NULL) IN (691925563)))) ORDER BY 'I' DESC; -- ROW_NUMBER derived table rewrite: returns only 1 row (incorrect) SELECT DISTINCTROW (NOT ((s.c2) IS FALSE)) AS ref0, (((+ (s.id))) IN (s.c3)) AND ((+ ((NULL) IN (691925563)))) AS ref1, COUNT(DISTINCT '伴ꪎ鲫K') AS ref2 FROM ( WITH ll AS ( SELECT ROW_NUMBER() OVER (ORDER BY id) AS rn, id, c2,c3,c6,c4 FROM l ), rr AS ( SELECT ROW_NUMBER() OVER (ORDER BY id) AS rn, id, c5,c1,c0,c7,c6 FROM r ) SELECT ll.id, rr.c0,rr.c1,ll.c2,ll.c3,ll.c4,rr.c5,ll.c6,rr.c7 FROM ll JOIN rr ON ll.rn=rr.rn ) s WHERE ((NOT ((CASE 476522779 WHEN 0.6412434890036196 THEN 0.09819858593173747 WHEN s.id THEN s.c6 WHEN -200221192 THEN s.id ELSE 0.5051427622194795 END)))) <= (((! ('a'))) >= (CAST(633536363 AS SIGNED))) GROUP BY (! ((s.c2) IS FALSE)), (((+ (s.id))) IN (s.c3)) AND ((+ ((NULL) IN (691925563)))) ORDER BY 'I' DESC;