Bug #69732 LIMIT clause results in duplicate data across pages
Submitted: 12 Jul 2013 14:29 Modified: 12 Jul 2013 18:29
Reporter: chetan verma Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6.11 MySQL Community Server (GPL) OS:Any
Assigned to: CPU Architecture:Any
Tags: limit, page, pagination, SELECT

[12 Jul 2013 14:29] chetan verma
Description:
Adding LIMIT clause with both offset and rowcount is returning duplicate data.
If a row is returned by limit clause for a given value of offset and rowcount, it should not be returned for a different value of row count. Because of this, not all data is getting returned for a pagination logic in my application; and in-fact duplicate data is being returned for same rowcount but different offset.

How to repeat:
create table test(id INT, col INT);

insert into test values(1,1);
insert into test values(2,1);
insert into test values(10,1);
insert into test values(3,1);

----------------------------------------------------------------------
Now,
mysql> select * from test order by col limit 0,2;
+------+------+
| id   | col  |
+------+------+
|    3 |    1 |
|    2 |    1 |
+------+------+

So now I should expect the remaining two rows for limit 2,2. But that is not what happens:
mysql> select * from test order by col limit 2,2;
+------+------+
| id   | col  |
+------+------+
|   10 |    1 |
|    3 |    1 |
+------+------+

So we see that we get id=3 again. And id=1 is missing. 

We need this to work correctly for pagination module in our application to work correctly.

Please note**: Please do suggest a work-around if available.

Suggested fix:
Not sure, but a default order should is what is needed to fix the issue. I am not sure but if Mysql maintains some meta data like timestamp when row was inserted ?
[12 Jul 2013 14:40] chetan verma
minor change to synopsis
[12 Jul 2013 15:02] Hartmut Holzgraefe
As you order by a constant value ("col" is always 1) the problem is the same as with a LIMIT without ORDER BY at all: the result order is not really defined/determined and may change without breaking the ORDER constraint ...

If you were using MySQL Cluster storage engine (ndbcluster) you'd even most likely see different results if just running the same query twice here without changing the LIMIT parameters ...

So yes, behavior changed between 5.5 and 5.6 here, but as the result order is undefined anyway => not a bug
[12 Jul 2013 15:05] Axel Schwenke
The behavior shown by the given example is formally correct IMHO. The ORDER BY on `col` does not yield a unique order of the rows (in fact any row order would satisfy the ORDER BY clause). Hence MySQL is also free to chose any row order.

Now this can just be seen as two executions of the query coming up with two different row orders. And then LIMIT is no longer guaranteed to pick disjuct sets of rows.

If we extend the ORDER BY to yield a unique row order, then anything works as expected:

select * from test order by col,id limit 0,2;
+------+------+
| id   | col  |
+------+------+
|    1 |    1 |
|    2 |    1 |
+------+------+

select * from test order by col limit 2,2;
+------+------+
| id   | col  |
+------+------+
|   10 |    1 |
|    3 |    1 |
+------+------+
[12 Jul 2013 15:08] Axel Schwenke
Bah, paste error

select * from test order by col,id limit 0,2;
+------+------+
| id   | col  |
+------+------+
|    1 |    1 |
|    2 |    1 |
+------+------+

select * from test order by col,id limit 2,2;
+------+------+
| id   | col  |
+------+------+
|    3 |    1 |
|   10 |    1 |
+------+------+
[12 Jul 2013 15:36] chetan verma
Well, IMHO LIMIT stands to serve a purpose. the LIMIT <ROWNUM, COUNT> in my opinion existed to return data separated into pages. What could be the use - case? The first thing that strikes is - Pagination. 
Now in order for pagination to work correctly, there has to be a fixed default behaviour. It doesnt matter what that is, but whats important is that it should be consistent across values of LIMIT ROWNUM, COUNT values.
Given the use-case I mentioned above for which LIMIT exists, I'd say this one is definitely a bug.
[12 Jul 2013 15:50] Hartmut Holzgraefe
Without a distinct ORDER BY the result order is undefined. Period.

Even if the same query, executed twice, results results in different order there is nothing wrong with that. You may be used to what looks like deterministic results but that is actually an illusion. It may be what you see in the usual case but there is *nothing* that guarantees this. Result order can change as statistics get updated, as index trees are reshuffled, as data is partitioned across different physical machines so that result order depends on network latencies ...

So if you want pagination then make sure you ORDER BY on something guaranteed to be UNIQUE, or live with the fact that the sort order of identical values in a non-unique sequence is not deterministic and can change at any time without prior notice.

Your example is equivalent to

  SELECT * FROM table ORDER BY 'x'; 

( not the same as "ORDER BY 1" as that would be 'order by the first result column )

and almost equivalent to 

  SELECT * FROM table ORDER BY RAND();

and you wouldn't expect that to return the same row order every time either, would you?
[12 Jul 2013 17:43] Justin Swanhart
If you want it to order by the order of insertions, add an auto_increment to capture the order of insertions and order by it.
[12 Jul 2013 18:29] Sinisa Milivojevic
I can only repeat what my colleagues Hartmut and Arjen have written.

Hence, not a bug.
[12 Jul 2013 18:29] Sinisa Milivojevic
Sorry, Hartmut and Axel.
[9 Oct 2017 8:42] ray allen
I'm sure it's a bug, the solution is modify syntax.

order by `rank` desc, `browse` desc
to
order by `rank` desc, `browse` desc, "primary" desc #(or asc)