Bug #72076 | unexpected results using limit in combination with none unique field sorting | ||
---|---|---|---|
Submitted: | 19 Mar 2014 13:09 | Modified: | 11 Jun 2014 15:27 |
Reporter: | Jeroen Zanten | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Documentation | Severity: | S2 (Serious) |
Version: | 5.6.16 | OS: | MacOS |
Assigned to: | Paul DuBois | CPU Architecture: | Any |
Tags: | limit, order by, regression |
[19 Mar 2014 13:09]
Jeroen Zanten
[19 Mar 2014 13:34]
MySQL Verification Team
Thank you for the bug report. mysql 5.1 > select * from Testcase2 order by aValue desc limit 4; +----+--------+----------------------+----------------+ | id | aValue | accuracyClassType_id | productType_id | +----+--------+----------------------+----------------+ | 10 | 20.00 | 2 | 3 | | 9 | 16.00 | 2 | 3 | | 14 | 12.00 | 2 | 4 | | 8 | 12.00 | 2 | 3 | +----+--------+----------------------+----------------+ 4 rows in set (0.00 sec) mysql 5.1 > select * from Testcase2 order by aValue desc limit 3; +----+--------+----------------------+----------------+ | id | aValue | accuracyClassType_id | productType_id | +----+--------+----------------------+----------------+ | 10 | 20.00 | 2 | 3 | | 9 | 16.00 | 2 | 3 | | 14 | 12.00 | 2 | 4 | +----+--------+----------------------+----------------+ 3 rows in set (0.00 sec) mysql 5.1 > show variables like "%version%"; +-------------------------+---------------------+ | Variable_name | Value | +-------------------------+---------------------+ | protocol_version | 10 | | version | 5.1.74-Win X64 | | version_comment | Source distribution | | version_compile_machine | unknown | | version_compile_os | Win64 | +-------------------------+---------------------+ 5 rows in set (0.02 sec) ---------------------------------------------------------------- mysql 5.6 > select * from Testcase2 order by aValue desc limit 4; +----+--------+----------------------+----------------+ | id | aValue | accuracyClassType_id | productType_id | +----+--------+----------------------+----------------+ | 10 | 20.00 | 2 | 3 | | 9 | 16.00 | 2 | 3 | | 14 | 12.00 | 2 | 4 | | 8 | 12.00 | 2 | 3 | +----+--------+----------------------+----------------+ 4 rows in set (0.00 sec) mysql 5.6 > select * from Testcase2 order by aValue desc limit 3; +----+--------+----------------------+----------------+ | id | aValue | accuracyClassType_id | productType_id | +----+--------+----------------------+----------------+ | 10 | 20.00 | 2 | 3 | | 9 | 16.00 | 2 | 3 | | 8 | 12.00 | 2 | 3 | +----+--------+----------------------+----------------+ 3 rows in set (0.00 sec) mysql 5.6 > show variables like "%version%"; +-------------------------+---------------------+ | Variable_name | Value | +-------------------------+---------------------+ | innodb_version | 5.6.17 | | protocol_version | 10 | | slave_type_conversions | | | version | 5.6.17 | | version_comment | Source distribution | | version_compile_machine | x86_64 | | version_compile_os | Win64 | +-------------------------+---------------------+ 7 rows in set (0.00 sec) mysql 5.6 >
[19 Mar 2014 15:00]
Tor Didriksen
Hello Jeroen I would argue that this is "not a bug" This is an equally valid answer: mysql 5.6 > select * from Testcase2 order by aValue desc limit 4; +----+--------+----------------------+----------------+ | id | aValue | accuracyClassType_id | productType_id | +----+--------+----------------------+----------------+ | 10 | 20.00 | 2 | 3 | | 9 | 16.00 | 2 | 3 | | 8 | 12.00 | 2 | 3 | | 14 | 12.00 | 2 | 4 | +----+--------+----------------------+----------------+ If you need this to be deterministic, you need to order by aValue, id Here's what the SQL standard says: 10.10 <sort specification list> General Rules 1) A <sort specification list> defines an ordering of rows, as follows: i) Two rows that are not distinct with respect to the <sort specification>s are said to be peers of each other. The relative ordering of peers is implementation-dependent.
[19 Mar 2014 15:20]
Jeroen Zanten
Hi Tor Didriksen, Yes you're right about an equally valid answer, but this is not the problem! The problem/bug is that limit 3 shows other results then a limit 4 (using the same query as mentioned). Note: the limit 4 results your showing are not in the same order on my MAC using MySQL 5.6.16 Kind regards, Jeroen
[19 Mar 2014 15:46]
Tor Didriksen
your query is non-deterministic. If you have several rows with aValue == 12.00 the result may depend on lots of things, like total data volume, storage engine, indexed/non-indexed column, etc. etc. It also depends on the value of 'limit'. If you want deterministic answers, you need to make the sort criterion disistinct.
[19 Mar 2014 16:04]
Jeroen Zanten
Hello Tor, Why is there a difference between mysql versions? See the useful submission of Godofredo Miguel Solorzano. We didn't have a problem with this using earlier versions of MySQL. Of course i understand the order by (in this case) is none deterministic, but it's still unexpectable behaviour when a limit 3 can (in any case) result in other results (as expected) then a limit 4 using the same query. Kind regards, Jeroen @Godofredo Miguel Solorzano: tnx for your submission!
[20 Mar 2014 6:07]
Tor Didriksen
Posted by developer: Here's an approximate explanation of what is going on. Assume you have 1000 rows in your table. Mysql 5.5 willl - allocate buffer space for 1000 rows - scan the table, inserting each row into the buffer - sort the buffer, this requires about 10.000 comparisons - return the first 3 (or 4) rows Mysql 5.6 will - allocate buffer space for 3 (or 4) rows - scan the table, maintaining a collection of 3 (or 4) "winners" this requires about 2000 comparisons - sort final set of "winners", this requires about 6 comparisons - return the result During the processing of your limit-3 query, the "maintain-three-winners" collection will see duplicates, and discard one of them. You can read more about this optimization here http://didrikdidrik.blogspot.co.uk/
[20 Mar 2014 8:23]
Jeroen Zanten
We understand your "approximate explanation". We understand it's none deterministic. But, in our opinion, it's still unexpected behaviour!
[20 Mar 2014 8:46]
Jeroen Zanten
Our conclusion is: optimisation as introduced for 5.6 (as mentioned on http://didrikdidrik.blogspot.co.uk/) has resulted in unexpected behaviour. It's unexpected behaviour when a limit 4 discards other results then a limit 3. In our opinion LIMIT should not affect the ordering of results. When the query is none deterministic we understand mysql needs to use it's own determined additional ordering, but this should not affect the result sequence when altering LIMIT. Kind regards, Jeroen
[20 Mar 2014 11:46]
Øystein Grøvlen
That LIMIT x may give a different ordering than LIMIT y is not something that is new in MySQL 5.6. Long before 5.6, the size of the limit have affected whether the query optimizer choose to do an index scan or table scan. Depending on which access method is used, the query may result in different order for rows where the ORDER BY column(s) are equal. The optimizer must have the freedom to execute the query the way that is most optimal given the query as specified. Otherwise, users which do not care about a certain (unspecified) behavior would have to pay the penalty. If you expect a deterministic order, you will have to add columns to the ORDER BY clause to make the order deterministic.
[11 Jun 2014 15:27]
Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly. Added the following discussion to: http://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns. One factor that affects the execution plan is LIMIT, so an ORDER BY query with and without LIMIT may return rows in different orders. Consider this query, which is sorted by the category column but nondeterministic with respect to the id and rating columns: mysql> SELECT * FROM ratings ORDER BY category; +----+----------+--------+ | id | category | rating | +----+----------+--------+ | 1 | 1 | 4.5 | | 5 | 1 | 3.2 | | 3 | 2 | 3.7 | | 4 | 2 | 3.5 | | 6 | 2 | 3.5 | | 2 | 3 | 5.0 | | 7 | 3 | 2.7 | +----+----------+--------+ Including LIMIT may affect order of rows within each category value. For example, this is a valid query result: mysql> SELECT * FROM ratings ORDER BY category LIMIT 5; +----+----------+--------+ | id | category | rating | +----+----------+--------+ | 1 | 1 | 4.5 | | 5 | 1 | 3.2 | | 4 | 2 | 3.5 | | 3 | 2 | 3.7 | | 6 | 2 | 3.5 | +----+----------+--------+ In each case, the rows are sorted by the ORDER BY column, which is all that is required by the SQL standard. If it is important to ensure the same row order with and without LIMIT, include additional columns in the ORDER BY clause to make the order deterministic. For example, if id values are unique, you can sort to make rows for a given category appear in id order: mysql> SELECT * FROM ratings ORDER BY category, id; +----+----------+--------+ | id | category | rating | +----+----------+--------+ | 1 | 1 | 4.5 | | 5 | 1 | 3.2 | | 3 | 2 | 3.7 | | 4 | 2 | 3.5 | | 6 | 2 | 3.5 | | 2 | 3 | 5.0 | | 7 | 3 | 2.7 | +----+----------+--------+ mysql> SELECT * FROM ratings ORDER BY category, id LIMIT 5; +----+----------+--------+ | id | category | rating | +----+----------+--------+ | 1 | 1 | 4.5 | | 5 | 1 | 3.2 | | 3 | 2 | 3.7 | | 4 | 2 | 3.5 | | 6 | 2 | 3.5 | +----+----------+--------+