Bug #115040 Incorrect result from lateral join with group by month/monthname function inside
Submitted: 16 May 2024 12:09 Modified: 16 May 2024 12:33
Reporter: Behnam Gogiev Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: GROUP BY, LATERAL, month, monthname

[16 May 2024 12:09] Behnam Gogiev
Description:
A lateral join consisting of a select grouped by function month/monthname returns the wrong result. Records with different months fall into the same group, with month 1.
For example, we have 2 tables:
user_accounts
id|user_id|
--+-------+
10|     10|
11|     10|
12|     11|
13|     11|
14|     12|
15|     12|

transactions
id|account|amount|trdate    |
--+-------+------+----------+
 1|     10|100.00|2024-01-01|
 2|     11| 50.00|2024-01-05|
 3|     12|100.00|2024-01-10|
 4|     13|100.00|2024-01-15|
 5|     14|100.00|2024-01-20|
 6|     15|100.00|2024-01-25|
 7|     13|100.00|2024-01-30|
 8|     11| 50.00|2024-02-05|
 9|     12|100.00|2024-02-10|
10|     13|200.00|2024-02-15|
11|     14| 50.00|2024-02-20|
12|     11|100.00|2024-02-25|
13|     14|100.00|2024-03-05|
14|     12|100.00|2024-03-10|
15|     12|100.00|2024-03-15|
16|     11|100.00|2024-03-20|
17|     10| 50.00|2024-03-25|

If run select:
select
    a.id
    ,t.*
from
    user_accounts a
    cross join lateral (
    	select
        	month(dt.trdate)	mnth
        	,sum(dt.amount)		saldo
        	#,group_concat(dt.id,',') transaction_id
        	,count(*)		cnt
        from transactions dt
        where
    		dt.account = a.id
        group by
        	mnth
    ) t

We get:
id|mnth|saldo |cnt|
--+----+------+---+
10|   1|100.00|  1|
10|   3| 50.00|  1|
11|   1|300.00|  4|
12|   1|400.00|  4|
13|   1|400.00|  3|
14|   1|250.00|  3|
15|   1|100.00|  1|

Which is incorrect, because 11, 12, 13, 14 accounts have entries in 2 and 3 months, and they were added to 1 month. But if uncomment group_concat function we get correct results:
id|mnth|saldo |transaction_id|cnt|
--+----+------+--------------+---+
10|   1|100.00|1,            |  1|
10|   3| 50.00|17,           |  1|
11|   1| 50.00|2,            |  1|
11|   2|150.00|8,,12,        |  2|
11|   3|100.00|16,           |  1|
12|   1|100.00|3,            |  1|
12|   2|100.00|9,            |  1|
12|   3|200.00|14,,15,       |  2|
13|   1|200.00|4,,7,         |  2|
13|   2|200.00|10,           |  1|
14|   1|100.00|5,            |  1|
14|   2| 50.00|11,           |  1|
14|   3|100.00|13,           |  1|
15|   1|100.00|6,            |  1|

Using the monthname function instead of month does not change the situation.
Also tested on:
MySQL 8.0.37 - the error is reproduced;
PostgreSQL 14.1 - error not reproducible.

How to repeat:
CREATE TABLE IF NOT EXISTS `user_accounts` (
  `id` int unsigned NOT NULL,
  `user_id` int unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`)
);
 
CREATE TABLE IF NOT EXISTS `transactions` (
  `id` int unsigned NOT NULL,
  `account` int unsigned NOT NULL,
  `amount` decimal(5,2) NOT NULL,
  `trdate` date NOT NULL,
  PRIMARY KEY (`id`),
  KEY `account` (`account`)
);
 
INSERT INTO `user_accounts` (`id`,`user_id`)
VALUES
 (10, 10),
 (11, 10),
 (12, 11),
 (13, 11),
 (14, 12),
 (15, 12);
 
INSERT INTO `transactions` (`id`,`account`,`amount`,`trdate`)
VALUES
  (1, 10, 100.00, '2024-01-01'),
  (2, 11, 50.00, '2024-01-05'),
  (3, 12, 100.00, '2024-01-10'),
  (4, 13, 100.00, '2024-01-15'),
  (5, 14, 100.00, '2024-01-20'),
  (6, 15, 100.00, '2024-01-25'),
  (7, 13, 100.00, '2024-01-30'),
  (8, 11, 50.00, '2024-02-05'),
  (9, 12, 100.00, '2024-02-10'),
  (10, 13, 200.00, '2024-02-15'),
  (11, 14, 50.00, '2024-02-20'),
  (12, 11, 100.00, '2024-02-25'),
  (13, 14, 100.00, '2024-03-05'),
  (14, 12, 100.00, '2024-03-10'),
  (15, 12, 100.00, '2024-03-15'),
  (16, 11, 100.00, '2024-03-20'),
  (17, 10, 50.00, '2024-03-25');
[16 May 2024 12:33] MySQL Verification Team
HI Mr. Googley,

Thank you for your bug report.

We have managed to repeat your test results:

id	mnth	saldo	cnt
10	1	100.00	1
10	3	50.00	1
11	1	300.00	4
12	1	400.00	4
13	1	400.00	3
14	1	250.00	3
15	1	100.00	1
id	mnth	saldo	transaction_id	cnt
10	1	100.00	1,	1
10	3	50.00	17,	1
11	1	50.00	2,	1
11	2	150.00	8,,12,	2
11	3	100.00	16,	1
12	1	100.00	3,	1
12	2	100.00	9,	1
12	3	200.00	14,,15,	2
13	1	200.00	4,,7,	2
13	2	200.00	10,	1
14	1	100.00	5,	1
14	2	50.00	11,	1
14	3	100.00	13,	1
15	1	100.00	6,	1

They are indeed wrong.

This is now a verified bug in the Optimiser, for the version 8.0 and all higher versions.

Thank you ...........
[28 Oct 2024 19:12] Andres Ramirez Guillen
I also had this issue. I tried it in different versions, and it kept giving me incorrect information in all of them:

8.0.31
8.0.37
8.0.39
8.0.40

I used this other approach.

CREATE TABLE table_1 (
    id INT PRIMARY KEY
);

INSERT INTO table_1 (id) VALUES (1),(2);

CREATE TABLE table_2 (
    id INT PRIMARY KEY AUTO_INCREMENT,
    fk_table_1_id INT,
    column_1 INT,
    created_at TIMESTAMP,
    CONSTRAINT fk_table_1_id FOREIGN KEY (fk_table_1_id) REFERENCES table_1 (id)
);

INSERT INTO table_2 (fk_table_1_id, column_1, created_at) VALUES
(1, 1, '2024-01-01 10:00:00'),(1, 2, '2024-01-01 11:00:00'),
(1, 4, '2024-01-02 11:00:00'),(1, 5, '2024-01-02 12:00:00'),
(2, 6, '2024-01-03 10:00:00'),(2, 7, '2024-01-03 11:00:00'),
(2, 8, '2024-01-04 10:00:00');

SELECT
    t1.id,
    t2.created_date,
    t2.column_1
FROM table_1 AS t1
INNER JOIN LATERAL (
    SELECT
        DATE(t2.created_at) AS created_date,
        MIN(t2.column_1) AS column_1
    FROM table_2 AS t2
    WHERE t2.fk_table_1_id = t1.id
    GROUP BY created_date
) AS t2 ON TRUE
WHERE t1.id IN (1,2);

It returns this:

+----+--------------+----------+
| id | created_date | column_1 |
+----+--------------+----------+
|  1 | 2024-01-01   |        1 |
|  1 | 2024-01-02   |        4 |
|  2 | 2024-01-03   |        6 |
+----+--------------+----------+

However, the expected result is this:

+----+--------------+----------+
| id | created_date | column_1 |
+----+--------------+----------+
|  1 | 2024-01-01   |        1 |
|  1 | 2024-01-02   |        4 |
|  2 | 2024-01-03   |        6 |
|  2 | 2024-01-04   |        8 |
+----+--------------+----------+

if I run the same query but instead of "WHERE t1.id IN (1,2)" I use "WHERE t1.id IN (2)" this returns the correct data

+----+--------------+----------+
| id | created_date | column_1 |
+----+--------------+----------+
|  2 | 2024-01-03   |        6 |
|  2 | 2024-01-04   |        8 |
+----+--------------+----------+

also, If I add this to the JOIN LATERAL, "COUNT(DISTINCT 1) AS lateral_fix" it works as a forced way to return the correct information, but I don’t like the idea of having to rely on this.

SELECT
    t1.id,
    t2.created_date,
    t2.column_1
FROM table_1 AS t1
INNER JOIN LATERAL (
    SELECT
        DATE(t2.created_at) AS created_date,
        COUNT(DISTINCT 1) AS lateral_fix,
        MIN(t2.column_1) AS column_1
    FROM table_2 AS t2
    WHERE t2.fk_table_1_id = t1.id
    GROUP BY created_date
) AS t2 ON TRUE
WHERE t1.id IN (1,2);

Is there any update on this fix? Because even in the latest versions, this bug persists.
[29 Oct 2024 10:09] MySQL Verification Team
Hi Mr. Gogley,

No, this bug is not fixed yet.

When this bug is fixed, this page will be updated and you will get the info on the release where the patch is applied to fix this bug.

Nobody knows when this bug will be fixed. Each Development Team has it's own schedule, which changes on weekly basis.