Bug #113587 | SELECT DISTINCT makes ORDER BY ... DESC sort data to ascending | ||
---|---|---|---|
Submitted: | 9 Jan 3:31 | Modified: | 15 Jan 3:33 |
Reporter: | Aristotle Po | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | select distinct order by |
[9 Jan 3:31]
Aristotle Po
[9 Jan 10:26]
MySQL Verification Team
Hi Mr. Po, We have run your test case and we have repeated your test results. We got very long outputs, so we are just showing the end part of the result sets. With DISTINCT and DESC, this is what we get: | 555292672 | 2087670284334 | | 555292673 | 2111380717633 | | 555292674 | 2111382323265 | | 555292675 | 2111383633985 | | 555292676 | 2111385796673 | | 555292677 | 2111388385345 | | 555292678 | 2111390318657 | | 555292679 | 2056949858387 | | 555292680 | 2087688568878 | | 555292681 | 2087689486382 | | 555292682 | 2004724711459 | | 555292683 | NULL | | 555292684 | NULL | | 555292685 | 2087699611694 | | 555292686 | 2172137013332 | | 555292687 | 2087705149486 | | 555292688 | 2257057349707 | | 555292689 | 2087719239726 | | 555292690 | 2080836288563 | | 555292691 | 2156770525262 | | 555292692 | 2087764721710 | | 555292693 | 2087764721710 | | 555292694 | 2087765606446 | | 555292695 | 2208392708201 | | 555292696 | NULL | | 555292697 | NULL | | 555292698 | 2168547836040 | | 555292699 | 2087842381870 | | 555292700 | 2004823834659 | +-----------+----------------------------------------------------------+ While when we JUST remove the DISTINCT part, this is what we get: | 555000014 | 833707212879 | | 555000013 | 733486383201 | | 555000012 | 750106902639 | | 555000011 | 749302808687 | | 555000010 | 597697822820 | | 555000009 | 595949191268 | | 555000008 | 703040061482 | | 555000007 | 595388072036 | | 555000006 | 706901278774 | | 555000005 | 683061542982 | | 555000004 | 650890018872 | | 555000003 | 710472335433 | | 555000002 | 709949980745 | | 555000001 | 709922095177 | | 555000000 | 671263031366 | +-----------+----------------------------------------------------------+ This is now a verified bug. We also consider this a serious one, hence S2 severity. It is affecting 5.7, 8.0 and all higher versions.
[12 Jan 2:40]
Aristotle Po
As a workaround : Use GROUP BY to remove duplicates as equivalent to DISTINCT mysql [localhost:8035] {msandbox} (test) > SELECT t1.order_id, t1.external_order_id FROM t1 INNER JOIN t2 USING (order_revision_id) GROUP BY t1.order_id, t1.external_order_id ORDER BY t1.order_id DESC; +-----------+----------------------------------------------------------+ | order_id | external_order_id | +-----------+----------------------------------------------------------+ | 555292700 | 2004823834659 | | 555292699 | 2087842381870 | | 555292698 | 2168547836040 | | 555292697 | NULL | | 555292696 | NULL | -- | 555000004 | 650890018872 | | 555000003 | 710472335433 | | 555000002 | 709949980745 | | 555000001 | 709922095177 | | 555000000 | 671263031366 | +-----------+----------------------------------------------------------+ 292426 rows in set (1.18 sec)
[12 Jan 10:38]
MySQL Verification Team
Thank you, Mr. Po, for the workaround.
[15 Jan 2:38]
Aristotle Po
Second Test case
Attachment: bug113587-test2-1.txt (text/plain), 3.34 KiB.
[15 Jan 2:40]
Aristotle Po
Second Test case condition 1 result
Attachment: bug113587-test2-2-condition-1-result-tail5.txt (text/plain), 5.67 KiB.
[15 Jan 2:43]
Aristotle Po
Second Test case condition 2 result
Attachment: bug113587-test2-2-condition-2-result-tail5.txt (text/plain), 6.65 KiB.
[15 Jan 2:46]
Aristotle Po
Posting this test case as an additinal information about the bug behavior : I made further test using same data, the numbers of rows for the bug to take effect is determined thru trial and error method. The bug depends on two conditions : 1) Depends on number of rows OR 2) Depends on both number of rows and how the data is inserted. I have attached the test case and the result : bug113587-test2-1.txt bug113587-test2-2-condition-1-result-tail5.txt bug113587-test2-2-condition-2-result-tail5.txt
[15 Jan 3:33]
Aristotle Po
My apology for the slight confusion. Just slight correction about the condition 2 where table a2 needs 606397 rows to be affected by Bug113587. On the last part : -- from 606498 TRUNCATE TABLE a2; INSERT INTO a2(order_revision_id) SELECT NULL FROM t2 LIMIT 606498; -- Bug113587 -- to 606398 TRUNCATE TABLE a2; INSERT INTO a2(order_revision_id) SELECT NULL FROM t2 LIMIT 606398; -- Bug113587 The result is still same as it got affected after 606397 mysql [localhost:8035] {msandbox} (test) > TRUNCATE TABLE a2; INSERT INTO a2(order_revision_id) SELECT NULL FROM t2 LIMIT 606398; -- OK Query OK, 0 rows affected (0.21 sec) Query OK, 606398 rows affected (1.98 sec) Records: 606398 Duplicates: 0 Warnings: 0 mysql [localhost:8035] {msandbox} (test) > SELECT DISTINCT t1.order_id FROM t1 INNER JOIN a2 t2 USING (order_revision_id) ORDER BY t1.order_id DESC; | 555184513 | | 555184520 | | 555184521 | | 555184525 | +-----------+ 181895 rows in set (0.72 sec)
[15 Jan 10:26]
MySQL Verification Team
Thank you, Mr. Po, for the additional test cases and results. We have verified this bug with a serious severity and your additional test cases (and results) will, no doubt, help in diagnosing the cause of this bug.
[27 Feb 2:23]
Yicheng Wei
This is a fix for Bug #113587 by checking for some conditions before applying the optimization to skip sort. (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: Bug#113587.patch (application/octet-stream, text), 7.96 KiB.
[27 Feb 2:26]
Yicheng Wei
Hi, I just submitted a patch to fix this bug. Please let me know if any modification is required for it to be accepted. Thanks! Below is a summary of my work: Problem: MySQL builds a temporary table for deduplication. When there is only one column in the ORDER BY clause which has an index, the optimizer will use a backward index scan for sorting, and just perform a sequential scan on the temporary table. This is ok when the temporary table fits in memory, which will return elements in insertion order. When tmp_table_size is exceeded and the temporary table spills to disk to use the InnoDB storage engine, it forces the elements to be sorted based on the primary index in ascending order. This makes the final sequential scan produce data in ascending order. Fix: Before test_skip_sort() in query optimization, we check for the following two conditions: 1. The query has a DISTINCT clause 2. The ORDER BY clause specifies DESC order If they both hold, we will not apply the skip sort optimization, and a filesort will be applied after reading from the temporary table to sort items in the desired order. Potential improvements: There is a third condition we could check before skipping the optimization: the worst-case estimated size of the temporary table exceeds the tmp_table_size. I tried using (number of rows) *(row size) to estimate the size of the temporary table, but it doesn’t work well. The actual size is far larger than the estimate, and doesn’t grow linearly with the number of rows. We can apply this third condition check if we can find a way to get a good estimation, as this would allow the skip sort optimization when it won’t cause the order problem.
[27 Feb 12:25]
MySQL Verification Team
Thank you very much Mr. Po, We shall analyse your patch and we shall let you know .......