Bug #113587 SELECT DISTINCT makes ORDER BY ... DESC sort data to ascending
Submitted: 9 Jan 3:31 Modified: 15 Jan 3:33
Reporter: Aristotle Po Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: select distinct order by

[9 Jan 3:31] Aristotle Po
Description:
SELECT DISTINCT  makes ORDER BY DESC sort data to ascending.

Server version:   8.0.35 MySQL Community Server - GPL

How to repeat:
Might be related to https://bugs.mysql.com/bug.php?id=99257 that is a verified bug on 5.7. The query for testing is similar except for the LIMIT clause.

1) Load the test data (data_set.sql) from bug 99257.

2) Execute below MySQL commands :

TEE /tmp/o1.out
SELECT DISTINCT t1.order_id, t1.external_order_id FROM t1 INNER JOIN t2 USING (order_revision_id) ORDER BY t1.order_id DESC;
TEE /tmp/o2.out
SELECT          t1.order_id, t1.external_order_id FROM t1 INNER JOIN t2 USING (order_revision_id) ORDER BY t1.order_id DESC;
EXIT;

3) Check outputs :

3.1) With DISTINCT clause, we can see that ORDER BY ... DESC was not followed, instead it is in ascending order. 

$ cat /tmp/o1.out
mysql [localhost:8035] {msandbox} (test) > SELECT DISTINCT t1.order_id, t1.external_order_id FROM t1 INNER JOIN t2 USING (order_revision_id) ORDER BY t1.order_id DESC;
+-----------+----------------------------------------------------------+
| order_id  | external_order_id                                        |
+-----------+----------------------------------------------------------+
| 555000000 | 671263031366                                             |
| 555000001 | 709922095177                                             |
| 555000002 | 709949980745                                             |
| 555000003 | 710472335433                                             |
| 555000004 | 650890018872                                             |
[...]
| 555292696 | NULL                                                     |
| 555292697 | NULL                                                     |
| 555292698 | 2168547836040                                            |
| 555292699 | 2087842381870                                            |
| 555292700 | 2004823834659                                            |
+-----------+----------------------------------------------------------+
292426 rows in set (1.07 sec)

 
3.2) While without DISTINCT clause, the ORDER BY ... DESC is followed correctly.

$ cat /tmp/o2.out
mysql [localhost:8035] {msandbox} (test) > SELECT          t1.order_id, t1.external_order_id FROM t1 INNER JOIN t2 USING (order_revision_id) ORDER BY t1.order_id DESC;
+-----------+----------------------------------------------------------+
| order_id  | external_order_id                                        |
+-----------+----------------------------------------------------------+
| 555292700 | 2004823834659                                            |
| 555292699 | 2087842381870                                            |
| 555292698 | 2168547836040                                            |
| 555292697 | NULL                                                     |
| 555292696 | NULL                                                     |
[...]
| 555000004 | 650890018872                                             |
| 555000003 | 710472335433                                             |
| 555000002 | 709949980745                                             |
| 555000001 | 709922095177                                             |
| 555000000 | 671263031366                                             |
+-----------+----------------------------------------------------------+
292426 rows in set (0.43 sec)

Suggested fix:
DISTINCT should not make the ORDER BY ... DESC to ascending.
[9 Jan 10:26] MySQL Verification Team
Hi Mr. Po,

We have run your test case and we have repeated your test results.

We got very long outputs, so we are just showing the end part of the result sets.

With DISTINCT and DESC, this is what we get:

| 555292672 | 2087670284334                                            |
| 555292673 | 2111380717633                                            |
| 555292674 | 2111382323265                                            |
| 555292675 | 2111383633985                                            |
| 555292676 | 2111385796673                                            |
| 555292677 | 2111388385345                                            |
| 555292678 | 2111390318657                                            |
| 555292679 | 2056949858387                                            |
| 555292680 | 2087688568878                                            |
| 555292681 | 2087689486382                                            |
| 555292682 | 2004724711459                                            |
| 555292683 | NULL                                                     |
| 555292684 | NULL                                                     |
| 555292685 | 2087699611694                                            |
| 555292686 | 2172137013332                                            |
| 555292687 | 2087705149486                                            |
| 555292688 | 2257057349707                                            |
| 555292689 | 2087719239726                                            |
| 555292690 | 2080836288563                                            |
| 555292691 | 2156770525262                                            |
| 555292692 | 2087764721710                                            |
| 555292693 | 2087764721710                                            |
| 555292694 | 2087765606446                                            |
| 555292695 | 2208392708201                                            |
| 555292696 | NULL                                                     |
| 555292697 | NULL                                                     |
| 555292698 | 2168547836040                                            |
| 555292699 | 2087842381870                                            |
| 555292700 | 2004823834659                                            |
+-----------+----------------------------------------------------------+

While when we JUST remove the DISTINCT part, this is what we get:

| 555000014 | 833707212879                                             |
| 555000013 | 733486383201                                             |
| 555000012 | 750106902639                                             |
| 555000011 | 749302808687                                             |
| 555000010 | 597697822820                                             |
| 555000009 | 595949191268                                             |
| 555000008 | 703040061482                                             |
| 555000007 | 595388072036                                             |
| 555000006 | 706901278774                                             |
| 555000005 | 683061542982                                             |
| 555000004 | 650890018872                                             |
| 555000003 | 710472335433                                             |
| 555000002 | 709949980745                                             |
| 555000001 | 709922095177                                             |
| 555000000 | 671263031366                                             |
+-----------+----------------------------------------------------------+

This is now a verified bug. We also consider this a serious one, hence S2 severity.

It is affecting 5.7, 8.0 and all higher versions.
[12 Jan 2:40] Aristotle Po
As a workaround : Use GROUP BY to remove duplicates as equivalent to DISTINCT

mysql [localhost:8035] {msandbox} (test) > SELECT          t1.order_id, t1.external_order_id FROM t1 INNER JOIN t2 USING (order_revision_id) GROUP BY t1.order_id, t1.external_order_id ORDER BY t1.order_id DESC;
+-----------+----------------------------------------------------------+
| order_id  | external_order_id                                        |
+-----------+----------------------------------------------------------+
| 555292700 | 2004823834659                                            |
| 555292699 | 2087842381870                                            |
| 555292698 | 2168547836040                                            |
| 555292697 | NULL                                                     |
| 555292696 | NULL                                                     |
--
| 555000004 | 650890018872                                             |
| 555000003 | 710472335433                                             |
| 555000002 | 709949980745                                             |
| 555000001 | 709922095177                                             |
| 555000000 | 671263031366                                             |
+-----------+----------------------------------------------------------+
292426 rows in set (1.18 sec)
[12 Jan 10:38] MySQL Verification Team
Thank you, Mr. Po, for the workaround.
[15 Jan 2:38] Aristotle Po
Second Test case

Attachment: bug113587-test2-1.txt (text/plain), 3.34 KiB.

[15 Jan 2:40] Aristotle Po
Second Test case condition 1 result

Attachment: bug113587-test2-2-condition-1-result-tail5.txt (text/plain), 5.67 KiB.

[15 Jan 2:43] Aristotle Po
Second Test case condition 2 result

Attachment: bug113587-test2-2-condition-2-result-tail5.txt (text/plain), 6.65 KiB.

[15 Jan 2:46] Aristotle Po
Posting this test case as an additinal information about the bug behavior :
I made further test using same data, the numbers of rows for the bug to take effect is determined thru trial and error method.
The bug depends on two conditions :
1) Depends on number of rows
OR
2) Depends on both number of rows and how the data is inserted.

I have attached the test case and the result :
bug113587-test2-1.txt
bug113587-test2-2-condition-1-result-tail5.txt
bug113587-test2-2-condition-2-result-tail5.txt
[15 Jan 3:33] Aristotle Po
My apology for the slight confusion.

Just slight correction about the condition 2 where table a2 needs 606397 rows to be affected by Bug113587.
On the last part :
-- from 606498
TRUNCATE TABLE a2; INSERT INTO a2(order_revision_id) SELECT NULL FROM t2 LIMIT 606498; -- Bug113587 
-- to 606398
TRUNCATE TABLE a2; INSERT INTO a2(order_revision_id) SELECT NULL FROM t2 LIMIT 606398; -- Bug113587 

The result is still same as it got affected after 606397
mysql [localhost:8035] {msandbox} (test) > TRUNCATE TABLE a2; INSERT INTO a2(order_revision_id) SELECT NULL FROM t2 LIMIT 606398; -- OK
Query OK, 0 rows affected (0.21 sec)

Query OK, 606398 rows affected (1.98 sec)
Records: 606398  Duplicates: 0  Warnings: 0

mysql [localhost:8035] {msandbox} (test) >         SELECT DISTINCT t1.order_id FROM t1 INNER JOIN a2 t2 USING (order_revision_id) ORDER BY t1.order_id DESC;
| 555184513 |
| 555184520 |
| 555184521 |
| 555184525 |
+-----------+
181895 rows in set (0.72 sec)
[15 Jan 10:26] MySQL Verification Team
Thank you, Mr. Po, for the additional test cases and results.

We have verified this bug with a serious severity and your additional test cases (and results) will, no doubt, help in diagnosing the cause of this bug.
[27 Feb 2:23] Yicheng Wei
This is a fix for Bug #113587 by checking for some conditions before applying the optimization to skip sort.

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: Bug#113587.patch (application/octet-stream, text), 7.96 KiB.

[27 Feb 2:26] Yicheng Wei
Hi, I just submitted a patch to fix this bug. Please let me know if any modification is required for it to be accepted. Thanks!
Below is a summary of my work:

Problem:
MySQL builds a temporary table for deduplication. When there is only one column in the ORDER BY clause which has an index, the optimizer will use a backward index scan for sorting, and just perform a sequential scan on the temporary table. This is ok when the temporary table fits in memory, which will return elements in insertion order. When tmp_table_size is exceeded and the temporary table spills to disk to use the InnoDB storage engine, it forces the elements to be sorted based on the primary index in ascending order. This makes the final sequential scan produce data in ascending order.

Fix:
Before test_skip_sort() in query optimization, we check for the following two conditions:
1. The query has a DISTINCT clause
2. The ORDER BY clause specifies DESC order
If they both hold, we will not apply the skip sort optimization, and a filesort will be applied after reading from the temporary table to sort items in the desired order.

Potential improvements:
There is a third condition we could check before skipping the optimization: the worst-case estimated size of the temporary table exceeds the tmp_table_size.
I tried using (number of rows) *(row size) to estimate the size of the temporary table, but it doesn’t work well. The actual size is far larger than the estimate, and doesn’t grow linearly with the number of rows.
We can apply this third condition check if we can find a way to get a good estimation, as this would allow the skip sort optimization when it won’t cause the order problem.
[27 Feb 12:25] MySQL Verification Team
Thank you very much Mr. Po,

We shall analyse your patch and we shall let you know .......