Bug #114762 invalid result set with join lateral and group by on a generated field
Submitted: 24 Apr 2024 10:05 Modified: 24 Apr 2024 13:22
Reporter: soheil rahsaz Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[24 Apr 2024 10:05] soheil rahsaz
Description:
Running this query which has a lateral join which includes a group by on a generated field, returns insufficient result, missing many rows.

Expected result has 2074 rows, but the query returns 120 rows.

My guess is that the problem occurs because of `subDate` field and because it's generated. If I group by another field, there's no problem.

(This is the sql_mode, not sure if it's related)
SELECT @@sql_mode;
STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION

How to repeat:
Sample code to generate the bug:

CREATE TABLE tbl1
(
    id   INT,
    name VARCHAR(32)
);

CREATE TABLE tbl2
(
    tbl1_id  INT,
    date_str VARCHAR(10),
    score    INT
);

#filling tbl1 with some random data
INSERT INTO tbl1 (id, name)
WITH RECURSIVE cte AS (SELECT 1 AS n
                       UNION
                       SELECT n + 1
                       FROM cte
                       WHERE n < 100)
SELECT n, MD5(UUID())
FROM cte;

#filling tbl2 with random data for each id in tbl1
INSERT INTO tbl2 (tbl1_id, date_str, score)
WITH RECURSIVE cte AS (SELECT 1 AS n
                       UNION
                       SELECT n + 1
                       FROM cte
                       WHERE n < 100)
SELECT tbl1.id, SUBSTRING(DATE_ADD(NOW(), INTERVAL ROUND(RAND() * 20) DAY), 1, 10), ROUND(RAND() * 100)
FROM tbl1
         JOIN cte;

#the query
SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate
FROM tbl1
         JOIN LATERAL (SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore
                        FROM tbl2
                        WHERE tbl2.tbl1_id = tbl1.id
                        GROUP BY subDate) a;
[24 Apr 2024 10:27] MySQL Verification Team
Hi Mr. rahsaz,

Thank you for your bug report.

However, this is not a bug.

Simply, your inner query is aggregated query, so the outer query is not an aggregated query.

The fact that you get more rows with other aggregated column is simply because the other columns have more distinct values.

Not a bug.
[24 Apr 2024 10:52] soheil rahsaz
I think I have failed to explain what is exactly happening.

Let's run the inner query with a where clause like this:
SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore
FROM tbl2
WHERE tbl2.tbl1_id = 2
GROUP BY subDate.

Which returns 21 rows:
+---------+----------+
| subDate | maxScore |
+---------+----------+
| 05-03   |       91 |
| 04-25   |       75 |
| 05-09   |       94 |
| 04-28   |       66 |
| 05-06   |       74 |
| 05-14   |       69 |
| 05-04   |       77 |
| 05-08   |       76 |
| 04-29   |       83 |
| 05-02   |       99 |
| 05-07   |       99 |
| 05-13   |       93 |
| 05-05   |       70 |
| 05-10   |       89 |
| 05-01   |       56 |
| 04-26   |       96 |
| 05-11   |       76 |
| 04-24   |       91 |
| 04-27   |       59 |
| 05-12   |       57 |
| 04-30   |       65 |
+---------+----------+

But If I run the original query with an order by clause like this:

SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate
FROM tbl1
         JOIN LATERAL (SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore
                        FROM tbl2
                        WHERE tbl2.tbl1_id = tbl1.id
                        GROUP BY subDate) a
ORDER BY tbl1.id ASC ;
+------+----------------------------------+----------+---------+
| id   | name                             | maxScore | subDate |
+------+----------------------------------+----------+---------+
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       97 | 05-09   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       73 | 05-02   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       63 | 05-04   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       97 | 05-10   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       72 | 04-28   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       90 | 05-11   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       85 | 05-12   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       68 | 04-27   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       72 | 04-30   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       98 | 05-14   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       82 | 05-13   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       88 | 05-08   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       87 | 05-05   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       94 | 04-29   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       97 | 05-03   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       88 | 05-07   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       96 | 04-25   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       87 | 05-01   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       42 | 04-24   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       38 | 05-06   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       72 | 04-26   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       99 | 05-03   |
|    3 | 774d24c2db3ccd100b24395c0e83fd81 |       99 | 04-28   |
|    4 | 6b8268475f19bdfb79d6adf4aaca724e |       99 | 05-06   |
|    5 | a5d9a05bfc778f877b5e5e5046e5bb69 |      100 | 05-08   |
|    6 | 8885e8a2e810d62e8b52567595c6e7c4 |      100 | 05-11   |
|    7 | 73d17cff0c58be10dad0f7711b08116e |      100 | 04-27   |
|    8 | 1c51b13fa5fc9ff436af27ab81d3f093 |      100 | 04-30   |
...
.

You see that there's only 1 row for id of 2. 

But then if I run the query again with a where clause like this:
SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate
FROM tbl1
         JOIN LATERAL (SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore
                        FROM tbl2
                        WHERE tbl2.tbl1_id = tbl1.id
                        GROUP BY subDate) a
WHERE tbl1.id > 1
ORDER BY tbl1.id ASC ;
+------+----------------------------------+----------+---------+
| id   | name                             | maxScore | subDate |
+------+----------------------------------+----------+---------+
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       91 | 05-03   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       75 | 04-25   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       94 | 05-09   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       66 | 04-28   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       74 | 05-06   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       69 | 05-14   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       77 | 05-04   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       76 | 05-08   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       83 | 04-29   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       99 | 05-02   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       99 | 05-07   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       93 | 05-13   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       70 | 05-05   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       89 | 05-10   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       56 | 05-01   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       96 | 04-26   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       76 | 05-11   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       91 | 04-24   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       59 | 04-27   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       57 | 05-12   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       65 | 04-30   |
|    3 | 774d24c2db3ccd100b24395c0e83fd81 |       99 | 04-28   |
|    4 | 6b8268475f19bdfb79d6adf4aaca724e |       99 | 05-06   |
|    5 | a5d9a05bfc778f877b5e5e5046e5bb69 |      100 | 05-08   |
|    6 | 8885e8a2e810d62e8b52567595c6e7c4 |      100 | 05-11   |
|    7 | 73d17cff0c58be10dad0f7711b08116e |      100 | 04-27   |
|    8 | 1c51b13fa5fc9ff436af27ab81d3f093 |      100 | 04-30   |
|    9 | f18e38596bdaa4f37e7c4123bbc07a53 |       99 | 05-04   |
...
.

you see that now there are 21 rows for id of 2, and just 1 for id of 3...
and so on.

The original query without `lateral` join is like this:

SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate
FROM tbl1
         JOIN (SELECT tbl2.tbl1_id, SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore
                       FROM tbl2
                       GROUP BY tbl2.tbl1_id, subDate) a on a.tbl1_id = tbl1.id;

which returns the appropriate result.
[24 Apr 2024 10:53] soheil rahsaz
I think I have failed to explain what is exactly happening.

Let's run the inner query with a where clause like this:
SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore
FROM tbl2
WHERE tbl2.tbl1_id = 2
GROUP BY subDate.

Which returns 21 rows:
+---------+----------+
| subDate | maxScore |
+---------+----------+
| 05-03   |       91 |
| 04-25   |       75 |
| 05-09   |       94 |
| 04-28   |       66 |
| 05-06   |       74 |
| 05-14   |       69 |
| 05-04   |       77 |
| 05-08   |       76 |
| 04-29   |       83 |
| 05-02   |       99 |
| 05-07   |       99 |
| 05-13   |       93 |
| 05-05   |       70 |
| 05-10   |       89 |
| 05-01   |       56 |
| 04-26   |       96 |
| 05-11   |       76 |
| 04-24   |       91 |
| 04-27   |       59 |
| 05-12   |       57 |
| 04-30   |       65 |
+---------+----------+

But If I run the original query with an order by clause like this:

SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate
FROM tbl1
         JOIN LATERAL (SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore
                        FROM tbl2
                        WHERE tbl2.tbl1_id = tbl1.id
                        GROUP BY subDate) a
ORDER BY tbl1.id ASC ;
+------+----------------------------------+----------+---------+
| id   | name                             | maxScore | subDate |
+------+----------------------------------+----------+---------+
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       97 | 05-09   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       73 | 05-02   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       63 | 05-04   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       97 | 05-10   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       72 | 04-28   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       90 | 05-11   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       85 | 05-12   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       68 | 04-27   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       72 | 04-30   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       98 | 05-14   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       82 | 05-13   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       88 | 05-08   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       87 | 05-05   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       94 | 04-29   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       97 | 05-03   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       88 | 05-07   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       96 | 04-25   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       87 | 05-01   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       42 | 04-24   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       38 | 05-06   |
|    1 | 446bd8f2acb57ed1dd65a56dd24c0b72 |       72 | 04-26   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       99 | 05-03   |
|    3 | 774d24c2db3ccd100b24395c0e83fd81 |       99 | 04-28   |
|    4 | 6b8268475f19bdfb79d6adf4aaca724e |       99 | 05-06   |
|    5 | a5d9a05bfc778f877b5e5e5046e5bb69 |      100 | 05-08   |
|    6 | 8885e8a2e810d62e8b52567595c6e7c4 |      100 | 05-11   |
|    7 | 73d17cff0c58be10dad0f7711b08116e |      100 | 04-27   |
|    8 | 1c51b13fa5fc9ff436af27ab81d3f093 |      100 | 04-30   |
...
.

You see that there's only 1 row for id of 2. 

But then if I run the query again with a where clause like this:
SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate
FROM tbl1
         JOIN LATERAL (SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore
                        FROM tbl2
                        WHERE tbl2.tbl1_id = tbl1.id
                        GROUP BY subDate) a
WHERE tbl1.id > 1
ORDER BY tbl1.id ASC ;
+------+----------------------------------+----------+---------+
| id   | name                             | maxScore | subDate |
+------+----------------------------------+----------+---------+
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       91 | 05-03   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       75 | 04-25   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       94 | 05-09   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       66 | 04-28   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       74 | 05-06   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       69 | 05-14   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       77 | 05-04   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       76 | 05-08   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       83 | 04-29   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       99 | 05-02   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       99 | 05-07   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       93 | 05-13   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       70 | 05-05   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       89 | 05-10   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       56 | 05-01   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       96 | 04-26   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       76 | 05-11   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       91 | 04-24   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       59 | 04-27   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       57 | 05-12   |
|    2 | 8e7f4d72273dd389d32b85ba70b04a28 |       65 | 04-30   |
|    3 | 774d24c2db3ccd100b24395c0e83fd81 |       99 | 04-28   |
|    4 | 6b8268475f19bdfb79d6adf4aaca724e |       99 | 05-06   |
|    5 | a5d9a05bfc778f877b5e5e5046e5bb69 |      100 | 05-08   |
|    6 | 8885e8a2e810d62e8b52567595c6e7c4 |      100 | 05-11   |
|    7 | 73d17cff0c58be10dad0f7711b08116e |      100 | 04-27   |
|    8 | 1c51b13fa5fc9ff436af27ab81d3f093 |      100 | 04-30   |
|    9 | f18e38596bdaa4f37e7c4123bbc07a53 |       99 | 05-04   |
...
.

you see that now there are 21 rows for id of 2, and just 1 for id of 3...
and so on.

The original query without `lateral` join is like this:

SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate
FROM tbl1
         JOIN (SELECT tbl2.tbl1_id, SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore
                       FROM tbl2
                       GROUP BY tbl2.tbl1_id, subDate) a on a.tbl1_id = tbl1.id;

which returns the appropriate result.
[24 Apr 2024 10:55] MySQL Verification Team
Hi Mr. rahsaz,

These are expected results.

When you change queries, the results change.

That is expected behaviour.
[24 Apr 2024 13:05] soheil rahsaz
I am not convinced that this is the expected result and here's proof:

ALTER TABLE tbl2  ADD COLUMN subDate VARCHAR(5);
UPDATE tbl2 SET tbl2.subDate = SUBSTRING(tbl2.date_str, 6, 10);

So now instead of generating the subDate in the query, I have saved the value, and I change the query to this:

SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate
FROM tbl1
         JOIN LATERAL (SELECT tbl2.subDate, MAX(tbl2.score) AS maxScore
                       FROM tbl2
                       WHERE tbl2.tbl1_id = tbl1.id
                       GROUP BY subDate) a;

and now this returns 2074 rows, but If I run with generated value:

SELECT tbl1.id, tbl1.name, a.maxScore, a.gen_subDate
FROM tbl1
         JOIN LATERAL (SELECT SUBSTRING(tbl2.date_str, 6, 10) AS gen_subDate, MAX(tbl2.score) AS maxScore
                       FROM tbl2
                       WHERE tbl2.tbl1_id = tbl1.id
                       GROUP BY gen_subDate) a;

I get 120 rows.
[24 Apr 2024 13:22] MySQL Verification Team
Hi Mr. rahsaz,

You are correct.

We have finally reproduced the behaviour.

This is now a verified Optimiser bug.