Bug #120377 Incorrect row count in MySQL when using LATERAL derived table with DISTINCT and CASE
Submitted: 29 Apr 8:45 Modified: 29 Apr 10:35
Reporter: Jasper Andrew Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.43 (MySQL Community Server - GPL) OS:Any (Docker container)
Assigned to: CPU Architecture:Any (x86_64)
Tags: case ... when, distinct, LATERAL

[29 Apr 8:45] Jasper Andrew
Description:
This query computes a COUNT(*) over a cross join between orders and a LATERAL subquery derived from a Cartesian product of orders and posts, filtered by a scalar value taken from comments.

Logically, the scalar subquery:
SELECT id FROM comments LIMIT 1 OFFSET 3
returns the value 4 based on the provided dataset.

Therefore the inner filter becomes:
ref_1.user_id != 4

Since none of the orders rows have user_id = 4, all 5 rows in orders are retained.

The inner FROM clause forms a Cartesian product between orders and posts, producing 5 × 5 = 25 rows. However, the CASE expression:
CASE WHEN ref_0.id <= ref_0.id THEN ref_2.content ELSE ref_2.title END
is always true, so it always evaluates to ref_2.content.

Applying DISTINCT over posts.content yields the set:
"First post", NULL, "Content", "Empty", "Last"
After removing NULL, 4 distinct values remain.

Under correct LATERAL semantics (as implemented by PostgreSQL and DuckDB), the subquery is executed once per row of orders (5 rows), producing 5 × 4 = 20 rows in total. Therefore the expected result is COUNT(*) = 20.

However, MySQL returns COUNT(*) = 25. This indicates that MySQL likely performs an incorrect optimization or rewrite of the LATERAL subquery when combined with DISTINCT and a constant-foldable CASE expression. The optimizer appears to partially decorrelate or reorder joins in a way that violates LATERAL semantics, leading to an inflated row count.

Expected result: 20
Actual result (MySQL): 25

How to repeat:
```SQL
-- Schema

CREATE TABLE users (
    id           INT,
    username     VARCHAR(100),
    email        VARCHAR(255),
    age          INT,
    status       VARCHAR(20),
    created_at   TIMESTAMP NULL,
    score        DOUBLE
);

CREATE TABLE posts (
    id          INT,
    user_id     INT,
    title       VARCHAR(255),
    content     VARCHAR(1000),
    views       INT,
    likes       INT,
    created_at  TIMESTAMP NULL,
    rating      DOUBLE
);

CREATE TABLE comments (
    id          INT,
    post_id     INT,
    user_id     INT,
    content     VARCHAR(1000),
    is_spam     INT,
    created_at  TIMESTAMP NULL
);

CREATE TABLE orders (
    id          INT,
    user_id     INT,
    amount      DOUBLE,
    status      VARCHAR(20),
    created_at  TIMESTAMP NULL
);

INSERT INTO users VALUES
(1, 'alice', 'alice@test.com', 20, 'active',  '2022-01-01 10:00:00', 88.5),
(2, 'bob',   'bob@test.com',   30, 'active',  '2022-01-02 11:00:00', 92.3),
(3, 'carol', NULL,             NULL, 'banned','2022-01-03 12:00:00', NULL),
(4, 'dave',  'dave@test.com',  45, 'active',  '2022-01-04 13:00:00', 65.2),
(5, NULL,    'null@test.com',  18, 'inactive','2022-01-05 14:00:00', 70.0);

INSERT INTO posts VALUES
(1, 1, 'Hello World', 'First post', 100, 10, '2022-01-10 10:00:00', 4.5),
(2, 1, 'Another Post', NULL,        150, 20, '2022-01-11 11:00:00', 3.0),
(3, 2, 'Bob Post',     'Content',   NULL,  5, '2022-01-12 12:00:00', NULL),
(4, 3, NULL,           'Empty',     50,   2, '2022-01-13 13:00:00', 5.0),
(5, 4, 'Last Post',    'Last',      300,  30,'2022-01-14 14:00:00', 4.9);

INSERT INTO comments VALUES
(1, 1, 2, 'Nice post', 0, '2022-01-20 10:00:00'),
(2, 1, 3, 'Spam here', 1,  '2022-01-21 11:00:00'),
(3, 2, 1, 'Thanks',    0, '2022-01-22 12:00:00'),
(4, 4, 5, NULL,        0, '2022-01-23 13:00:00');

INSERT INTO orders VALUES
(1, 1, 100.00, 'paid',    '2022-02-01 09:00:00'),
(2, 1, 200.50, 'shipped', '2022-02-02 10:00:00'),
(3, 2, NULL,   'failed',  '2022-02-03 11:00:00'),
(4, 3, 50.00,  'paid',    '2022-02-04 12:00:00'),
(5, 5, 999.99, 'paid',    '2022-02-05 13:00:00');

-- Trigger SQL

SELECT COUNT(*)
FROM orders AS ref_0,
     LATERAL (
         SELECT DISTINCT
             CASE
                 WHEN ref_0.id <= ref_0.id
                     THEN ref_2.content
                 ELSE ref_2.title
             END AS c0
         FROM orders AS ref_1, posts  AS ref_2
         WHERE ref_1.user_id != (
             SELECT id
             FROM comments
             LIMIT 1 OFFSET 3
         )
     ) AS subq_0
WHERE subq_0.c0 IS NOT NULL;

-- Result: {25} wrong!
```
[29 Apr 10:35] Chaithra Marsur Gopala Reddy
Hi Jasper Andrew,

We are unable to reproduce the problem in latest versions of MySQL-8.4, MySQL-9.7. We see the problem only on MySQL-8.0.