Bug #114762 | invalid result set with join lateral and group by on a generated field | ||
---|---|---|---|
Submitted: | 24 Apr 2024 10:05 | Modified: | 24 Apr 2024 13:22 |
Reporter: | soheil rahsaz | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[24 Apr 2024 10:05]
soheil rahsaz
[24 Apr 2024 10:27]
MySQL Verification Team
Hi Mr. rahsaz, Thank you for your bug report. However, this is not a bug. Simply, your inner query is aggregated query, so the outer query is not an aggregated query. The fact that you get more rows with other aggregated column is simply because the other columns have more distinct values. Not a bug.
[24 Apr 2024 10:52]
soheil rahsaz
I think I have failed to explain what is exactly happening. Let's run the inner query with a where clause like this: SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore FROM tbl2 WHERE tbl2.tbl1_id = 2 GROUP BY subDate. Which returns 21 rows: +---------+----------+ | subDate | maxScore | +---------+----------+ | 05-03 | 91 | | 04-25 | 75 | | 05-09 | 94 | | 04-28 | 66 | | 05-06 | 74 | | 05-14 | 69 | | 05-04 | 77 | | 05-08 | 76 | | 04-29 | 83 | | 05-02 | 99 | | 05-07 | 99 | | 05-13 | 93 | | 05-05 | 70 | | 05-10 | 89 | | 05-01 | 56 | | 04-26 | 96 | | 05-11 | 76 | | 04-24 | 91 | | 04-27 | 59 | | 05-12 | 57 | | 04-30 | 65 | +---------+----------+ But If I run the original query with an order by clause like this: SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate FROM tbl1 JOIN LATERAL (SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore FROM tbl2 WHERE tbl2.tbl1_id = tbl1.id GROUP BY subDate) a ORDER BY tbl1.id ASC ; +------+----------------------------------+----------+---------+ | id | name | maxScore | subDate | +------+----------------------------------+----------+---------+ | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 97 | 05-09 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 73 | 05-02 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 63 | 05-04 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 97 | 05-10 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 72 | 04-28 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 90 | 05-11 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 85 | 05-12 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 68 | 04-27 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 72 | 04-30 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 98 | 05-14 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 82 | 05-13 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 88 | 05-08 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 87 | 05-05 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 94 | 04-29 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 97 | 05-03 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 88 | 05-07 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 96 | 04-25 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 87 | 05-01 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 42 | 04-24 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 38 | 05-06 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 72 | 04-26 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 99 | 05-03 | | 3 | 774d24c2db3ccd100b24395c0e83fd81 | 99 | 04-28 | | 4 | 6b8268475f19bdfb79d6adf4aaca724e | 99 | 05-06 | | 5 | a5d9a05bfc778f877b5e5e5046e5bb69 | 100 | 05-08 | | 6 | 8885e8a2e810d62e8b52567595c6e7c4 | 100 | 05-11 | | 7 | 73d17cff0c58be10dad0f7711b08116e | 100 | 04-27 | | 8 | 1c51b13fa5fc9ff436af27ab81d3f093 | 100 | 04-30 | ... . You see that there's only 1 row for id of 2. But then if I run the query again with a where clause like this: SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate FROM tbl1 JOIN LATERAL (SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore FROM tbl2 WHERE tbl2.tbl1_id = tbl1.id GROUP BY subDate) a WHERE tbl1.id > 1 ORDER BY tbl1.id ASC ; +------+----------------------------------+----------+---------+ | id | name | maxScore | subDate | +------+----------------------------------+----------+---------+ | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 91 | 05-03 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 75 | 04-25 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 94 | 05-09 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 66 | 04-28 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 74 | 05-06 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 69 | 05-14 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 77 | 05-04 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 76 | 05-08 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 83 | 04-29 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 99 | 05-02 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 99 | 05-07 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 93 | 05-13 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 70 | 05-05 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 89 | 05-10 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 56 | 05-01 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 96 | 04-26 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 76 | 05-11 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 91 | 04-24 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 59 | 04-27 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 57 | 05-12 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 65 | 04-30 | | 3 | 774d24c2db3ccd100b24395c0e83fd81 | 99 | 04-28 | | 4 | 6b8268475f19bdfb79d6adf4aaca724e | 99 | 05-06 | | 5 | a5d9a05bfc778f877b5e5e5046e5bb69 | 100 | 05-08 | | 6 | 8885e8a2e810d62e8b52567595c6e7c4 | 100 | 05-11 | | 7 | 73d17cff0c58be10dad0f7711b08116e | 100 | 04-27 | | 8 | 1c51b13fa5fc9ff436af27ab81d3f093 | 100 | 04-30 | | 9 | f18e38596bdaa4f37e7c4123bbc07a53 | 99 | 05-04 | ... . you see that now there are 21 rows for id of 2, and just 1 for id of 3... and so on. The original query without `lateral` join is like this: SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate FROM tbl1 JOIN (SELECT tbl2.tbl1_id, SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore FROM tbl2 GROUP BY tbl2.tbl1_id, subDate) a on a.tbl1_id = tbl1.id; which returns the appropriate result.
[24 Apr 2024 10:53]
soheil rahsaz
I think I have failed to explain what is exactly happening. Let's run the inner query with a where clause like this: SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore FROM tbl2 WHERE tbl2.tbl1_id = 2 GROUP BY subDate. Which returns 21 rows: +---------+----------+ | subDate | maxScore | +---------+----------+ | 05-03 | 91 | | 04-25 | 75 | | 05-09 | 94 | | 04-28 | 66 | | 05-06 | 74 | | 05-14 | 69 | | 05-04 | 77 | | 05-08 | 76 | | 04-29 | 83 | | 05-02 | 99 | | 05-07 | 99 | | 05-13 | 93 | | 05-05 | 70 | | 05-10 | 89 | | 05-01 | 56 | | 04-26 | 96 | | 05-11 | 76 | | 04-24 | 91 | | 04-27 | 59 | | 05-12 | 57 | | 04-30 | 65 | +---------+----------+ But If I run the original query with an order by clause like this: SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate FROM tbl1 JOIN LATERAL (SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore FROM tbl2 WHERE tbl2.tbl1_id = tbl1.id GROUP BY subDate) a ORDER BY tbl1.id ASC ; +------+----------------------------------+----------+---------+ | id | name | maxScore | subDate | +------+----------------------------------+----------+---------+ | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 97 | 05-09 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 73 | 05-02 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 63 | 05-04 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 97 | 05-10 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 72 | 04-28 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 90 | 05-11 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 85 | 05-12 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 68 | 04-27 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 72 | 04-30 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 98 | 05-14 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 82 | 05-13 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 88 | 05-08 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 87 | 05-05 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 94 | 04-29 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 97 | 05-03 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 88 | 05-07 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 96 | 04-25 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 87 | 05-01 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 42 | 04-24 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 38 | 05-06 | | 1 | 446bd8f2acb57ed1dd65a56dd24c0b72 | 72 | 04-26 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 99 | 05-03 | | 3 | 774d24c2db3ccd100b24395c0e83fd81 | 99 | 04-28 | | 4 | 6b8268475f19bdfb79d6adf4aaca724e | 99 | 05-06 | | 5 | a5d9a05bfc778f877b5e5e5046e5bb69 | 100 | 05-08 | | 6 | 8885e8a2e810d62e8b52567595c6e7c4 | 100 | 05-11 | | 7 | 73d17cff0c58be10dad0f7711b08116e | 100 | 04-27 | | 8 | 1c51b13fa5fc9ff436af27ab81d3f093 | 100 | 04-30 | ... . You see that there's only 1 row for id of 2. But then if I run the query again with a where clause like this: SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate FROM tbl1 JOIN LATERAL (SELECT SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore FROM tbl2 WHERE tbl2.tbl1_id = tbl1.id GROUP BY subDate) a WHERE tbl1.id > 1 ORDER BY tbl1.id ASC ; +------+----------------------------------+----------+---------+ | id | name | maxScore | subDate | +------+----------------------------------+----------+---------+ | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 91 | 05-03 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 75 | 04-25 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 94 | 05-09 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 66 | 04-28 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 74 | 05-06 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 69 | 05-14 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 77 | 05-04 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 76 | 05-08 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 83 | 04-29 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 99 | 05-02 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 99 | 05-07 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 93 | 05-13 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 70 | 05-05 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 89 | 05-10 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 56 | 05-01 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 96 | 04-26 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 76 | 05-11 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 91 | 04-24 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 59 | 04-27 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 57 | 05-12 | | 2 | 8e7f4d72273dd389d32b85ba70b04a28 | 65 | 04-30 | | 3 | 774d24c2db3ccd100b24395c0e83fd81 | 99 | 04-28 | | 4 | 6b8268475f19bdfb79d6adf4aaca724e | 99 | 05-06 | | 5 | a5d9a05bfc778f877b5e5e5046e5bb69 | 100 | 05-08 | | 6 | 8885e8a2e810d62e8b52567595c6e7c4 | 100 | 05-11 | | 7 | 73d17cff0c58be10dad0f7711b08116e | 100 | 04-27 | | 8 | 1c51b13fa5fc9ff436af27ab81d3f093 | 100 | 04-30 | | 9 | f18e38596bdaa4f37e7c4123bbc07a53 | 99 | 05-04 | ... . you see that now there are 21 rows for id of 2, and just 1 for id of 3... and so on. The original query without `lateral` join is like this: SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate FROM tbl1 JOIN (SELECT tbl2.tbl1_id, SUBSTRING(tbl2.date_str, 6, 10) AS subDate, MAX(tbl2.score) AS maxScore FROM tbl2 GROUP BY tbl2.tbl1_id, subDate) a on a.tbl1_id = tbl1.id; which returns the appropriate result.
[24 Apr 2024 10:55]
MySQL Verification Team
Hi Mr. rahsaz, These are expected results. When you change queries, the results change. That is expected behaviour.
[24 Apr 2024 13:05]
soheil rahsaz
I am not convinced that this is the expected result and here's proof: ALTER TABLE tbl2 ADD COLUMN subDate VARCHAR(5); UPDATE tbl2 SET tbl2.subDate = SUBSTRING(tbl2.date_str, 6, 10); So now instead of generating the subDate in the query, I have saved the value, and I change the query to this: SELECT tbl1.id, tbl1.name, a.maxScore, a.subDate FROM tbl1 JOIN LATERAL (SELECT tbl2.subDate, MAX(tbl2.score) AS maxScore FROM tbl2 WHERE tbl2.tbl1_id = tbl1.id GROUP BY subDate) a; and now this returns 2074 rows, but If I run with generated value: SELECT tbl1.id, tbl1.name, a.maxScore, a.gen_subDate FROM tbl1 JOIN LATERAL (SELECT SUBSTRING(tbl2.date_str, 6, 10) AS gen_subDate, MAX(tbl2.score) AS maxScore FROM tbl2 WHERE tbl2.tbl1_id = tbl1.id GROUP BY gen_subDate) a; I get 120 rows.
[24 Apr 2024 13:22]
MySQL Verification Team
Hi Mr. rahsaz, You are correct. We have finally reproduced the behaviour. This is now a verified Optimiser bug.