Bug #19058 SELECT with LIMIT (offset, maxrows) can yield incorrect results
Submitted: 12 Apr 2006 20:10 Modified: 12 Apr 2006 22:23
Reporter: Dave Cosgrove
Status: Not a Bug
Category:Server Severity:S2 (Serious)
Version:4.1.11 OS:Microsoft Windows (Windows XP Professional)
Assigned to: Target Version:

[12 Apr 2006 20:10] Dave Cosgrove
Description:
os version: Windows XP Pro Version 2002, SP2
MySQL DB version: 4.1.11-nt

The utilization of LIMIT ( offset, max rows) in conjuction with a SELECT query can lead
to incorrect results when the 'max rows' count referenced by the 'LIMIT clause' is
greater than the number of records required to satisfy the query.  In the aforementioned
scenario, it appears that a 'filesort' operation is performed which can lead to
unpredictable results.

How to repeat:
When issuing 'EXPLAIN SELECT...' for all queries in the reproduction scenario below, it
appears that all queries except for the last utilize the defined index (idx_load_time). 
The final SELECT statement opts for a 'filesort' instead.  As a result, this last SELECT
retrieves records that have been previously retrieved, and neglects to include the
remaining records with the '2006-12-04 00:00:10' timestamp.  We have temporarily worked
around this problem by utilizing a SQL hint ( FORCE INDEX idx_load_time ) in order to
avoid the filesort that results in unpredictable results.

-- <BEGIN REPRODUCTION TEST>

USE test;

DROP TABLE `test1`;

CREATE TABLE `test1` (
 `test_number` int(10) unsigned NOT NULL,
 `load_time` datetime NOT NULL default '0000-00-00 00:00:00'
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

ALTER TABLE test1 ADD INDEX idx_load_time(load_time);

TRUNCATE TABLE test1;

INSERT INTO test1 ( test_number, load_time )
VALUES ( 1, '2006-12-04 00:00:01' ),
       ( 2, '2006-12-04 00:00:02' ),
       ( 3, '2006-12-04 00:00:03' ),
       ( 4, '2006-12-04 00:00:04' ),
       ( 5, '2006-12-04 00:00:05' ),
       ( 6, '2006-12-04 00:00:06' ),
       ( 7, '2006-12-04 00:00:07' ),
       ( 8, '2006-12-04 00:00:08' ),
       ( 9, '2006-12-04 00:00:09' ),
       ( 10, '2006-12-04 00:00:10' ),
       ( 11, '2006-12-04 00:00:10' ),
       ( 12, '2006-12-04 00:00:10' ),
       ( 13, '2006-12-04 00:00:10' ),
       ( 14, '2006-12-04 00:00:10' ),
       ( 15, '2006-12-04 00:00:10' ),
       ( 16, '2006-12-04 00:00:10' ),
       ( 17, '2006-12-04 00:00:10' ),
       ( 18, '2006-12-04 00:00:10' ),
       ( 19, '2006-12-04 00:00:10' );

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 0,5;

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 5,5;

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 10,5;

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 15,5;
[12 Apr 2006 20:14] Dave Cosgrove
Sample output from test provided that reflects incorrect results.

Attachment: limit_bug.out (application/octet-stream, text), 3.49 KiB.

[12 Apr 2006 20:57] Miguel Solorzano
Thank you for the bug report. Please see the ORDER BY column:

mysql> SELECT * FROM test1
    -> WHERE load_time > 0
    -> ORDER BY load_time
    -> LIMIT 15,5;

+-------------+---------------------+
| test_number | load_time           |
+-------------+---------------------+
|          12 | 2006-12-04 00:00:10 |
|          11 | 2006-12-04 00:00:10 |
|          10 | 2006-12-04 00:00:10 |
|          19 | 2006-12-04 00:00:10 |
+-------------+---------------------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM test1
    -> WHERE load_time > 0
    -> ORDER BY test_number
    -> LIMIT 15,5;
+-------------+---------------------+
| test_number | load_time           |
+-------------+---------------------+
|          16 | 2006-12-04 00:00:10 |
|          17 | 2006-12-04 00:00:10 |
|          18 | 2006-12-04 00:00:10 |
|          19 | 2006-12-04 00:00:10 |
+-------------+---------------------+
4 rows in set (0.00 sec)

select vmysql> select version();
+------------------+
| version()        |
+------------------+
| 4.1.19-debug-log |
+------------------+
1 row in set (0.00 sec)
[12 Apr 2006 21:38] Dave Cosgrove
My intention was to order by 'load_time' not 'test_number'. The test_number column was
added for reference purposes only.

Here's a revised reproduction scenario that might better illustrate the problem to you
(you'll notice that some test records never get returned!):

--<BEGIN CODE>
USE test;

DROP TABLE `test1`;

CREATE TABLE `test1` (
 `test_case` varchar(64) NOT NULL,
 `load_time` datetime NOT NULL default '0000-00-00 00:00:00'
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

ALTER TABLE test1 ADD INDEX idx_load_time(load_time);

TRUNCATE TABLE test1;

INSERT INTO test1 ( test_case, load_time )
VALUES ( 'a', '2006-12-04 00:00:01' ),
       ( 'b', '2006-12-04 00:00:02' ),
       ( 'd', '2006-12-04 00:00:03' ),
       ( 'b', '2006-12-04 00:00:04' ),
       ( 'a', '2006-12-04 00:00:05' ),
       ( 'f', '2006-12-04 00:00:06' ),
       ( 'g', '2006-12-04 00:00:07' ),
       ( 'g', '2006-12-04 00:00:08' ),
       ( 'i', '2006-12-04 00:00:09' ),
       ( 'a', '2006-12-04 00:00:10' ),
       ( 'c', '2006-12-04 00:00:10' ),
       ( 'd', '2006-12-04 00:00:10' ),
       ( 'q', '2006-12-04 00:00:10' ),
       ( 'r', '2006-12-04 00:00:10' ),
       ( 's', '2006-12-04 00:00:10' ),
       ( 't', '2006-12-04 00:00:10' ),
       ( 'u', '2006-12-04 00:00:10' ),
       ( 'v', '2006-12-04 00:00:10' ),
       ( 'v', '2006-12-04 00:00:10' );

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 0,5;

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 5,5;

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 10,5;

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 15,5;
--<END CODE>
[12 Apr 2006 21:41] Dave Cosgrove
Output from revised reproduction scenario to better illustrate problem.

Attachment: limit_bug2.out (application/octet-stream, text), 3.44 KiB.

[12 Apr 2006 21:46] Dave Cosgrove
Reopened
[12 Apr 2006 22:23] Dean Ellis
The core problem is that you sort by a single column for which there are several rows that
have identical values.  Without additional sorting criteria in your ORDER BY clause, the
precise order by which those rows (with identical values for that sort column) appear in
the result set can change from one execution to the next.

After loading the second test case, this should be visible with two queries:

1) SELECT * FROM test1 WHERE load_time > 0 ORDER BY load_time;

2) SELECT * FROM test1 FORCE INDEX (idx_load_time) WHERE load_time > 0 ORDER BY
load_time;

The access patterns are different, so a different order of results is produced by each
query.  The results are both valid, however, because there is no other additional sort
criterion specified.

Just as "SELECT * FROM test1", without any specified ORDER BY sorting criteria, may
produce different results from one invocation to the next, you can see a similar result
when you are providing insufficient sorting criteria to guarantee the sort order from one
execution to the next.