Bug #19058 SELECT with LIMIT (offset, maxrows) can yield incorrect results
Submitted: 12 Apr 2006 18:10 Modified: 12 Apr 2006 20:23
Reporter: Dave Cosgrove Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.11 OS:Windows (Windows XP Professional)
Assigned to: CPU Architecture:Any

[12 Apr 2006 18:10] Dave Cosgrove
Description:
os version: Windows XP Pro Version 2002, SP2
MySQL DB version: 4.1.11-nt

The utilization of LIMIT ( offset, max rows) in conjuction with a SELECT query can lead to incorrect results when the 'max rows' count referenced by the 'LIMIT clause' is greater than the number of records required to satisfy the query.  In the aforementioned scenario, it appears that a 'filesort' operation is performed which can lead to unpredictable results.

How to repeat:
When issuing 'EXPLAIN SELECT...' for all queries in the reproduction scenario below, it appears that all queries except for the last utilize the defined index (idx_load_time).  The final SELECT statement opts for a 'filesort' instead.  As a result, this last SELECT retrieves records that have been previously retrieved, and neglects to include the remaining records with the '2006-12-04 00:00:10' timestamp.  We have temporarily worked around this problem by utilizing a SQL hint ( FORCE INDEX idx_load_time ) in order to avoid the filesort that results in unpredictable results.

-- <BEGIN REPRODUCTION TEST>

USE test;

DROP TABLE `test1`;

CREATE TABLE `test1` (
 `test_number` int(10) unsigned NOT NULL,
 `load_time` datetime NOT NULL default '0000-00-00 00:00:00'
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

ALTER TABLE test1 ADD INDEX idx_load_time(load_time);

TRUNCATE TABLE test1;

INSERT INTO test1 ( test_number, load_time )
VALUES ( 1, '2006-12-04 00:00:01' ),
       ( 2, '2006-12-04 00:00:02' ),
       ( 3, '2006-12-04 00:00:03' ),
       ( 4, '2006-12-04 00:00:04' ),
       ( 5, '2006-12-04 00:00:05' ),
       ( 6, '2006-12-04 00:00:06' ),
       ( 7, '2006-12-04 00:00:07' ),
       ( 8, '2006-12-04 00:00:08' ),
       ( 9, '2006-12-04 00:00:09' ),
       ( 10, '2006-12-04 00:00:10' ),
       ( 11, '2006-12-04 00:00:10' ),
       ( 12, '2006-12-04 00:00:10' ),
       ( 13, '2006-12-04 00:00:10' ),
       ( 14, '2006-12-04 00:00:10' ),
       ( 15, '2006-12-04 00:00:10' ),
       ( 16, '2006-12-04 00:00:10' ),
       ( 17, '2006-12-04 00:00:10' ),
       ( 18, '2006-12-04 00:00:10' ),
       ( 19, '2006-12-04 00:00:10' );

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 0,5;

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 5,5;

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 10,5;

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 15,5;
[12 Apr 2006 18:14] Dave Cosgrove
Sample output from test provided that reflects incorrect results.

Attachment: limit_bug.out (application/octet-stream, text), 3.49 KiB.

[12 Apr 2006 18:57] MySQL Verification Team
Thank you for the bug report. Please see the ORDER BY column:

mysql> SELECT * FROM test1
    -> WHERE load_time > 0
    -> ORDER BY load_time
    -> LIMIT 15,5;

+-------------+---------------------+
| test_number | load_time           |
+-------------+---------------------+
|          12 | 2006-12-04 00:00:10 |
|          11 | 2006-12-04 00:00:10 |
|          10 | 2006-12-04 00:00:10 |
|          19 | 2006-12-04 00:00:10 |
+-------------+---------------------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM test1
    -> WHERE load_time > 0
    -> ORDER BY test_number
    -> LIMIT 15,5;
+-------------+---------------------+
| test_number | load_time           |
+-------------+---------------------+
|          16 | 2006-12-04 00:00:10 |
|          17 | 2006-12-04 00:00:10 |
|          18 | 2006-12-04 00:00:10 |
|          19 | 2006-12-04 00:00:10 |
+-------------+---------------------+
4 rows in set (0.00 sec)

select vmysql> select version();
+------------------+
| version()        |
+------------------+
| 4.1.19-debug-log |
+------------------+
1 row in set (0.00 sec)
[12 Apr 2006 19:38] Dave Cosgrove
My intention was to order by 'load_time' not 'test_number'. The test_number column was added for reference purposes only.

Here's a revised reproduction scenario that might better illustrate the problem to you (you'll notice that some test records never get returned!):

--<BEGIN CODE>
USE test;

DROP TABLE `test1`;

CREATE TABLE `test1` (
 `test_case` varchar(64) NOT NULL,
 `load_time` datetime NOT NULL default '0000-00-00 00:00:00'
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

ALTER TABLE test1 ADD INDEX idx_load_time(load_time);

TRUNCATE TABLE test1;

INSERT INTO test1 ( test_case, load_time )
VALUES ( 'a', '2006-12-04 00:00:01' ),
       ( 'b', '2006-12-04 00:00:02' ),
       ( 'd', '2006-12-04 00:00:03' ),
       ( 'b', '2006-12-04 00:00:04' ),
       ( 'a', '2006-12-04 00:00:05' ),
       ( 'f', '2006-12-04 00:00:06' ),
       ( 'g', '2006-12-04 00:00:07' ),
       ( 'g', '2006-12-04 00:00:08' ),
       ( 'i', '2006-12-04 00:00:09' ),
       ( 'a', '2006-12-04 00:00:10' ),
       ( 'c', '2006-12-04 00:00:10' ),
       ( 'd', '2006-12-04 00:00:10' ),
       ( 'q', '2006-12-04 00:00:10' ),
       ( 'r', '2006-12-04 00:00:10' ),
       ( 's', '2006-12-04 00:00:10' ),
       ( 't', '2006-12-04 00:00:10' ),
       ( 'u', '2006-12-04 00:00:10' ),
       ( 'v', '2006-12-04 00:00:10' ),
       ( 'v', '2006-12-04 00:00:10' );

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 0,5;

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 5,5;

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 10,5;

SELECT * FROM test1
WHERE load_time > 0
ORDER BY load_time
LIMIT 15,5;
--<END CODE>
[12 Apr 2006 19:41] Dave Cosgrove
Output from revised reproduction scenario to better illustrate problem.

Attachment: limit_bug2.out (application/octet-stream, text), 3.44 KiB.

[12 Apr 2006 19:46] Dave Cosgrove
Reopened
[12 Apr 2006 20:23] Dean Ellis
The core problem is that you sort by a single column for which there are several rows that have identical values.  Without additional sorting criteria in your ORDER BY clause, the precise order by which those rows (with identical values for that sort column) appear in the result set can change from one execution to the next.

After loading the second test case, this should be visible with two queries:

1) SELECT * FROM test1 WHERE load_time > 0 ORDER BY load_time;

2) SELECT * FROM test1 FORCE INDEX (idx_load_time) WHERE load_time > 0 ORDER BY load_time;

The access patterns are different, so a different order of results is produced by each query.  The results are both valid, however, because there is no other additional sort criterion specified.

Just as "SELECT * FROM test1", without any specified ORDER BY sorting criteria, may produce different results from one invocation to the next, you can see a similar result when you are providing insufficient sorting criteria to guarantee the sort order from one execution to the next.