Bug #93922 UNION ALL very slow with SUM(0)
Submitted: 14 Jan 2019 16:04 Modified: 7 May 2019 9:43
Reporter: Sergio Paternoster Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.13 OS:Linux
Assigned to: CPU Architecture:x86

[14 Jan 2019 16:04] Sergio Paternoster
Description:
In a UNION ALL of SQL queries with grouped records, the result is significantly slower if one of the queries has SUM(0) or SUM(<constant number>) instead of SUM(<numeric field>). 
Tested on v8.0.13 **it doesn't happen** in 5.7.x

How to repeat:
CREATE TABLE test.product_sales (
  `Year` smallint(5) NOT NULL,
  `Month` smallint(5) NOT NULL,
  `Product` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL,
  `Amount` double(15,5) NOT NULL DEFAULT '0.00000',
  PRIMARY KEY (`Year`,`Month`,`Product`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Fill the table with several records. I tested it with ~15,000 records. Then run these two sets of UNION ALL queries:

===== THIS EXECUTES VERY FAST =====
SELECT 
`Month`,`Product`,
SUM(`Amount`) AS `Amount`
FROM test.product_sales
-- add conditions here
GROUP BY `Month`,`Product`

UNION ALL

SELECT 
`Month`,`Product`,
SUM(`Amount`) AS `Amount`
FROM test.product_sales
-- add conditions here
GROUP BY `Month`,`Product`

===== THIS EXECUTES *VERY* SLOW =====
SELECT 
`Month`,`Product`,
SUM(`Amount`) AS `Amount`
FROM test.product_sales
-- add conditions here
GROUP BY `Month`,`Product`

UNION ALL

SELECT 
`Month`,`Product`,
SUM(0) AS `Amount`
FROM test.product_sales
-- add conditions here
GROUP BY `Month`,`Product`
[15 Jan 2019 13:06] Sinisa Milivojevic
Hi,

Thank you for your bug report.

However, do note that what you report is the expected behaviour.

First of all, this table does not have suitable index to be used in this aggregating query. Which is why first query is slow.

Second query does not have anything to do, in its second node, hence there is no need for the creation of the temporary table. That fact, plus the fact that number of rows is half of the first query, renders this all the expected behaviour.
[15 Jan 2019 14:17] Sergio Paternoster
Sorry Sinisa, I wasn't clear. You say that indexes are missing but, if you added the primary key fields in both SELECT and the behavior won't change. Please see the difference between the two examples: SUM(`Amount` * 0) in the first one and SUM(0) in the second one!!

This UNION will be very fast:

SELECT 
`Month`,`Product`,
SUM(`Amount`) AS `Amount`
FROM test.product_sales
WHERE `Year` =2018 and `Month` = 10 and `Product` = 'Cars'
GROUP BY `Month`,`Product`

UNION ALL

SELECT 
`Month`,`Product`,
SUM(`Amount` * 0) AS `Amount`
FROM test.product_sales
WHERE `Year`= 2018 and `Month` = 10 and `Product` = 'Motorbikes'
GROUP BY `Month`,`Product`

While this UNION will be ***very*** slow

SELECT 
`Month`,`Product`,
SUM(`Amount`) AS `Amount`
FROM test.product_sales
WHERE `Year` =2018 and `Month` = 10 and `Product` = 'Cars'
GROUP BY `Month`,`Product`

UNION ALL

SELECT 
`Month`,`Product`,
SUM(0) AS `Amount`
FROM test.product_sales
WHERE `Year`= 2018 and `Month` = 10 and `Product` = 'Motorbikes'
GROUP BY `Month`,`Product`
[15 Jan 2019 14:21] Sinisa Milivojevic
Hi,

I thought that I was clear. Index can not be used to resolve the aggregation, since your GROUP BY is not done by 'Year','Month','Product', but only by second and third segments.

The other observations of mine still stand and are not related in any manner with the available indices ...
[15 Jan 2019 14:37] Sergio Paternoster
Sinisa, I'll try to recap the issue regardless the indexes etc.

The issue is that, in those two UNIONs, there is a huge performances difference if you use SUM(Amount * 0) (which is very fast) vs SUM(0) (which is very slow).

That's all. I tested this on 5/7 version and both SUM(Amount * 0) and SUM(Amount * 0) are both equally very fast. The problem happens with 8.0.x
[15 Jan 2019 14:41] Sinisa Milivojevic
Hi,

I fully understand what you wrote so I am repeating my answer.

you report is the expected behaviour.

Second query is much faster, since it does not have anything to do, in its second node, hence there is no need for the creation of the temporary table. 

That fact, plus the fact that number of rows is half of the first query, renders this all the expected behaviour.
[15 Jan 2019 15:05] Sergio Paternoster
"Second query is much faster..." that's what I'm saying: the Second query is much *SLOWER* than the first one!!!
[15 Jan 2019 18:33] Sinisa Milivojevic
Sorry, I misread that.

Second query is much slower ??? With or without the WHERE clause. Or to ask in a different manner:

If you omit WHERE clause, which query is slower ?????/
[15 Jan 2019 19:11] Sergio Paternoster
Sinisa, I will put together a csv file for you to test this on a real table. Then I'll also provide you with two real examples (one fast and the second slow). I'll be back soon. Thanks for your help!
[16 Jan 2019 12:53] Sinisa Milivojevic
Hi Sergio,

Excellent idea. CSV file or a mysqldump file would enable it to check all possible causes for this behaviour.

Thanks in  advance !!!!!
[16 Jan 2019 13:17] Sinisa Milivojevic
Sergio,

Just when you are done with uploading, let me know about it.

Dump or CSV are both fine.

Use "Files" tab for the uploading. If a file is too big compressed, limit number of rows.
[16 Jan 2019 16:21] Sergio Paternoster
mysqldump of table with data plus two queries examples

Attachment: bug_93922.zip (application/x-zip-compressed, text), 262.99 KiB.

[16 Jan 2019 16:23] Sergio Paternoster
Hi Sinisa. I've added the mysqldump of the table and two SQL queries, one fast and the other slow. The result is exactly the same for both queries.
The only differences between the two queries is the SUM(amount * 0) vs sum(0)
Please, let me know if you need anything else to debug. Thanks!
[17 Jan 2019 14:16] Sinisa Milivojevic
Thank you for your test case.

I am analysing it right now ....
[17 Jan 2019 15:00] Sinisa Milivojevic
Hi,

I have actually used your first example, as column names with blanks are awkward to use from the command line. So, my queries look like this:

(SELECT Month,Product, SUM(Amount) as Amount  FROM test.product_sales GROUP BY Year, Month,Product ORDER BY NULL) UNION ALL (SELECT Month,Product, SUM(Amount) as Amount  FROM test.product_sales GROUP BY Year, Month,Product ORDER BY NULL)

and

(SELECT Month,Product, SUM(Amount) as Amount  FROM test.product_sales GROUP BY Year, Month,Product ORDER BY NULL) UNION ALL (SELECT Month,Product, SUM(0.00) as Amount  FROM test.product_sales GROUP BY Year, Month,Product ORDER BY NULL)

I tried both SUM(0) and SUM(0.0) and I get the same result, time-wise.

Both EXPLAINs are 100 % the same and each looks like this:

+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| id | select_type | table         | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra           |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
|  1 | PRIMARY     | product_sales | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 24531 |   100.00 | Using temporary |
|  2 | UNION       | product_sales | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 24531 |   100.00 | Using temporary |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+

This is a timing for the first query in your opening comment of this bug report:

real	0m3.709s
user	0m0.084s
sys	0m0.029s

This is a timing when SUM(0) AS Amount is used in the second node of the UNION:

real	4m35.681s
user	0m0.079s
sys	0m0.022s

That means that if you do SUM(0) AS Amount, you are actually 90 (ninety) times slower !!!!!!

Verified as a very serious optimiser bug.

Thank you very much for your contribution !!!!!!
[9 Apr 2019 14:32] Sergio Paternoster
Hi, is there any release scheduled with a fix for this bug? Many thanks!
[10 Apr 2019 12:27] Sinisa Milivojevic
Hi Sergio,

Scheduling of the bug fixes is an internal process to which even I do not have any access. Hence, I can't provide you with any info, since this is a very internal process in the company.

However, since I have set this bug to the severity of S2, which is high, I can only inform you that my evaluation was accepted.
[24 Apr 2019 22:58] Jon Stephens
Documented fix as follows in the MySQL 8.0.17 changelog:

    A UNION ALL query with SUM(constant) was processed very slowly
    compared to the same query using SUM(column) instead.

Closed.
[25 Apr 2019 13:04] Sinisa Milivojevic
Hi Jon,

Thank you very much for your feedback !!!!