Bug #120733 DISTINCTROW with ROW_NUMBER() Derived Table and GROUP BY Causes Duplicate Removal Key to Miss a Non‑Aggregate Column, Me
Submitted: 20 Jun 13:38
Reporter: Annie liu Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:MySQL 9.6 OS:Any
Assigned to: CPU Architecture:Any

[20 Jun 13:38] Annie liu
Description:
When a query is rewritten using ROW_NUMBER() window functions into a CTE derived table, and the outer query uses both DISTINCTROW and GROUP BY, the optimizer generates a duplicate‑removal key (Sort with duplicate removal) that includes only some SELECT columns (ref0 and ref2), completely omitting another non‑aggregate column ref1.

Since ref1 evaluates to NULL in one row and 0 in another (not equal in SQL), both rows should be kept. However, the duplicate removal, not comparing ref1, incorrectly treats them as duplicates and keeps only one.

The original single‑table query returns two rows, while the ROW_NUMBER‑rewritten version returns only one, indicating the issue is specific to the window‑function / CTE materialization path.

Actual vs Expected Results

Query Type	Result Comparison (Actual → Expected)
Original single table	2 rows: (1, NULL, 1), (1, 0, 1) → Correct
ROW_NUMBER rewrite	1 row: (1, NULL, 1) → Incorrect (missing (1, 0, 1))

EXPLAIN Analysis (Derived Table Version – Key Part)

Sort: ref0, ((ll.id = ll.c3) and (NULL = 691925563))
  Sort with duplicate removal: ref0, ref2
    Stream results
      Group aggregate: count(distinct '伴ꪎ鲫K')
        Nested loop inner join
          Materialize CTE ll
            Window aggregate: row_number() OVER (ORDER BY l.id)
          Materialize CTE rr
            Window aggregate: row_number() OVER (ORDER BY r.id)
Key observation:

Sort with duplicate removal: ref0, ref2 explicitly shows that duplicate removal is performed only on ref0 and ref2, without listing ref1.

Although GROUP BY includes the full expression for ref1, the removal step ignores it.

Unlike the other case (where the key had an extra aggregate column), here the key is missing a non‑aggregate column.

How to repeat:
DROP DATABASE IF EXISTS repro_m616_31_direct;
CREATE DATABASE repro_m616_31_direct;
USE repro_m616_31_direct;

CREATE TABLE src(
  id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  c0 DECIMAL(65,30), c1 TEXT, c2 FLOAT, c3 DECIMAL(65,30),
  c4 FLOAT, c5 DECIMAL(65,30), c6 DECIMAL(65,30), c7 BIGINT
) ENGINE=InnoDB;

INSERT INTO src(c0,c1,c2,c3,c4,c5,c6,c7) VALUES
  (0,NULL,-1,2092634463,NULL,NULL,NULL,NULL),
  (1299791121,NULL,NULL,NULL,0.786613,NULL,NULL,1);

CREATE TABLE l(
  id BIGINT NOT NULL PRIMARY KEY,
  c2 FLOAT, c3 DECIMAL(65,30), c6 DECIMAL(65,30), c4 FLOAT
) ENGINE=InnoDB;

CREATE TABLE r(
  id BIGINT NOT NULL PRIMARY KEY,
  c5 DECIMAL(65,30), c1 TEXT, c0 DECIMAL(65,30), c7 BIGINT, c6 DECIMAL(65,30)
) ENGINE=InnoDB;

INSERT INTO l SELECT id,c2,c3,c6,c4 FROM src;
INSERT INTO r SELECT id,c5,c1,c0,c7,c6 FROM src;

-- Original query (single table): returns 2 rows (correct)
SELECT DISTINCTROW
  (NOT ((src.c2) IS FALSE)) AS ref0,
  (((+ (src.id))) IN (src.c3)) AND ((+ ((NULL) IN (691925563)))) AS ref1,
  COUNT(DISTINCT '伴ꪎ鲫K') AS ref2
FROM src
WHERE ((NOT ((CASE 476522779
  WHEN 0.6412434890036196 THEN 0.09819858593173747
  WHEN src.id THEN src.c6
  WHEN -200221192 THEN src.id
  ELSE 0.5051427622194795
END)))) <= (((! ('a'))) >= (CAST(633536363 AS SIGNED)))
GROUP BY
  (! ((src.c2) IS FALSE)),
  (((+ (src.id))) IN (src.c3)) AND ((+ ((NULL) IN (691925563))))
ORDER BY 'I' DESC;

-- ROW_NUMBER derived table rewrite: returns only 1 row (incorrect)
SELECT DISTINCTROW
  (NOT ((s.c2) IS FALSE)) AS ref0,
  (((+ (s.id))) IN (s.c3)) AND ((+ ((NULL) IN (691925563)))) AS ref1,
  COUNT(DISTINCT '伴ꪎ鲫K') AS ref2
FROM (
  WITH
  ll AS (
    SELECT ROW_NUMBER() OVER (ORDER BY id) AS rn, id, c2,c3,c6,c4 FROM l
  ),
  rr AS (
    SELECT ROW_NUMBER() OVER (ORDER BY id) AS rn, id, c5,c1,c0,c7,c6 FROM r
  )
  SELECT ll.id, rr.c0,rr.c1,ll.c2,ll.c3,ll.c4,rr.c5,ll.c6,rr.c7
  FROM ll JOIN rr ON ll.rn=rr.rn
) s
WHERE ((NOT ((CASE 476522779
  WHEN 0.6412434890036196 THEN 0.09819858593173747
  WHEN s.id THEN s.c6
  WHEN -200221192 THEN s.id
  ELSE 0.5051427622194795
END)))) <= (((! ('a'))) >= (CAST(633536363 AS SIGNED)))
GROUP BY
  (! ((s.c2) IS FALSE)),
  (((+ (s.id))) IN (s.c3)) AND ((+ ((NULL) IN (691925563))))
ORDER BY 'I' DESC;