MySQL Bugs: #69732: LIMIT clause results in duplicate data across pages

Bug #69732	LIMIT clause results in duplicate data across pages
Submitted:	12 Jul 2013 14:29	Modified:	12 Jul 2013 18:29
Reporter:	chetan verma	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.6.11 MySQL Community Server (GPL)	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	limit, page, pagination, SELECT

Description:
Adding LIMIT clause with both offset and rowcount is returning duplicate data.
If a row is returned by limit clause for a given value of offset and rowcount, it should not be returned for a different value of row count. Because of this, not all data is getting returned for a pagination logic in my application; and in-fact duplicate data is being returned for same rowcount but different offset.

How to repeat:
create table test(id INT, col INT);

insert into test values(1,1);
insert into test values(2,1);
insert into test values(10,1);
insert into test values(3,1);

----------------------------------------------------------------------
Now,
mysql> select * from test order by col limit 0,2;
+------+------+
| id   | col  |
+------+------+
|    3 |    1 |
|    2 |    1 |
+------+------+

So now I should expect the remaining two rows for limit 2,2. But that is not what happens:
mysql> select * from test order by col limit 2,2;
+------+------+
| id   | col  |
+------+------+
|   10 |    1 |
|    3 |    1 |
+------+------+

So we see that we get id=3 again. And id=1 is missing. 

We need this to work correctly for pagination module in our application to work correctly.

Please note**: Please do suggest a work-around if available.

Suggested fix:
Not sure, but a default order should is what is needed to fix the issue. I am not sure but if Mysql maintains some meta data like timestamp when row was inserted ?

minor change to synopsis

As you order by a constant value ("col" is always 1) the problem is the same as with a LIMIT without ORDER BY at all: the result order is not really defined/determined and may change without breaking the ORDER constraint ...

If you were using MySQL Cluster storage engine (ndbcluster) you'd even most likely see different results if just running the same query twice here without changing the LIMIT parameters ...

So yes, behavior changed between 5.5 and 5.6 here, but as the result order is undefined anyway => not a bug

The behavior shown by the given example is formally correct IMHO. The ORDER BY on `col` does not yield a unique order of the rows (in fact any row order would satisfy the ORDER BY clause). Hence MySQL is also free to chose any row order.

Now this can just be seen as two executions of the query coming up with two different row orders. And then LIMIT is no longer guaranteed to pick disjuct sets of rows.

If we extend the ORDER BY to yield a unique row order, then anything works as expected:

select * from test order by col,id limit 0,2;
+------+------+
| id   | col  |
+------+------+
|    1 |    1 |
|    2 |    1 |
+------+------+

select * from test order by col limit 2,2;
+------+------+
| id   | col  |
+------+------+
|   10 |    1 |
|    3 |    1 |
+------+------+

Bah, paste error

select * from test order by col,id limit 0,2;
+------+------+
| id   | col  |
+------+------+
|    1 |    1 |
|    2 |    1 |
+------+------+

select * from test order by col,id limit 2,2;
+------+------+
| id   | col  |
+------+------+
|    3 |    1 |
|   10 |    1 |
+------+------+

Well, IMHO LIMIT stands to serve a purpose. the LIMIT <ROWNUM, COUNT> in my opinion existed to return data separated into pages. What could be the use - case? The first thing that strikes is - Pagination. 
Now in order for pagination to work correctly, there has to be a fixed default behaviour. It doesnt matter what that is, but whats important is that it should be consistent across values of LIMIT ROWNUM, COUNT values.
Given the use-case I mentioned above for which LIMIT exists, I'd say this one is definitely a bug.

Without a distinct ORDER BY the result order is undefined. Period.

Even if the same query, executed twice, results results in different order there is nothing wrong with that. You may be used to what looks like deterministic results but that is actually an illusion. It may be what you see in the usual case but there is *nothing* that guarantees this. Result order can change as statistics get updated, as index trees are reshuffled, as data is partitioned across different physical machines so that result order depends on network latencies ...

So if you want pagination then make sure you ORDER BY on something guaranteed to be UNIQUE, or live with the fact that the sort order of identical values in a non-unique sequence is not deterministic and can change at any time without prior notice.

Your example is equivalent to

  SELECT * FROM table ORDER BY 'x'; 

( not the same as "ORDER BY 1" as that would be 'order by the first result column )

and almost equivalent to 

  SELECT * FROM table ORDER BY RAND();

and you wouldn't expect that to return the same row order every time either, would you?

If you want it to order by the order of insertions, add an auto_increment to capture the order of insertions and order by it.

I can only repeat what my colleagues Hartmut and Arjen have written.

Hence, not a bug.

Sorry, Hartmut and Axel.

I'm sure it's a bug, the solution is modify syntax.

order by `rank` desc, `browse` desc
to
order by `rank` desc, `browse` desc, "primary" desc #(or asc)