Bug #41790 Wrong optimization on large offsets for LIMIT clause
Submitted: 30 Dec 2008 3:35 Modified: 30 Jan 2009 8:29
Reporter: Javier Smaldone Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.0.51a OS:Linux
Assigned to: CPU Architecture:Any
Tags: filesort, INDEX, limit, OFFSET, Optimizer, performance

[30 Dec 2008 3:35] Javier Smaldone
Description:

- When selecting from a table with "ORDER BY" using an indexed field, the optimizer doesn't use the index (full table scan + filesort).

- When selecting from a table, with "ORDER BY" and "LIMIT n,m", the optimizer uses the index only if n+m is less than the rows in the table.

How to repeat:
Create the following table:

CREATE TABLE mytable(
  id int(10) NOT NULL auto_increment,
  mycolumn int(10),
  PRIMARY KEY(id),
  KEY idx_mycolumn(mycolumn)
) ENGINE=MyISAM;

Fill the table with about 1,000,000 rows. 

If you do a query like: 

SELECT * FROM mytable ORDER BY mycolumn;

it took a lot of time (about 1 min on my AMD64 2.2Ghz) and uses a filesort. Issuing "USE INDEX(idx_mycolumn)" don't solve the problem, but using "FORCE INDEX" do (no filesort, time under 6 seconds). 

When executing the following query:

SELECT * FROM mytable ORDER BY mycolumn LIMIT n, m;

the optimizer decides to use the index (and solves the query in about 3 seconds) only if n+m is less than the rows count (1.000.000). "USE INDEX" doesn't works, but "FORCE INDEX" does.

Suggested fix:
I'd like the optimizer to be able to guess the right index for such a simple situation.
[30 Dec 2008 3:48] Javier Smaldone
Some extra info: 
 * The problem is the same being the engine used MyISAM or InnoDB.
 * The problem doesn’t get solved after doing a “ANALYZE TABLE“.
 * I’ve trying with other datatypes for the sorting column, with the same results.
 * I’ve tested this issue with versions 5.0.35 and 5.0.51a, with identical results.
[30 Dec 2008 8:29] Valeriy Kravchuk
Thank you for a problem report. Please, try to run your queries with SQL_NO_CACHE and measure average execution time for several attempts. 

I confirm your findings about execution plans with 5.0.74, but for me query that uses filesort runs actually faster than

select SQL_NO_CACHE * from mytable FORCE INDEX(idx_mycolumn) ORDER BY mycolumn;

So, optimizer's choice is not obviously wrong, IMHO.
[30 Dec 2008 10:04] Valeriy Kravchuk
This is what I've got:

openxs@suse:/home2/openxs/dbs/5.0> time bin/mysql -uroot test -e "select * from mytable order by c2" >/dev/null

real    0m7.449s
user    0m2.035s
sys     0m0.550s
openxs@suse:/home2/openxs/dbs/5.0> time bin/mysql -uroot test -e "select * from mytable force index (c2) order by c2" >/dev/null

real    0m19.388s
user    0m1.774s
sys     0m0.688s
openxs@suse:/home2/openxs/dbs/5.0> time bin/mysql -uroot test -e "select * from mytable order by c2" >/dev/null

real    0m6.473s
user    0m1.728s
sys     0m0.743s
openxs@suse:/home2/openxs/dbs/5.0> time bin/mysql -uroot test -e "select * from mytable force index (c2) order by c2" >/dev/null

real    0m13.312s
user    0m1.743s
sys     0m0.697s

openxs@suse:/home2/openxs/dbs/5.0> time bin/mysql -uroot test -e "select * from mytable order by c2 limit 900000, 150000" >/dev/null

real    0m3.070s
user    0m0.258s
sys     0m0.109s
openxs@suse:/home2/openxs/dbs/5.0> time bin/mysql -uroot test -e "select * from mytable force index (c2) order by c2 limit 900000, 150000" >/dev/null

real    0m8.454s
user    0m0.239s
sys     0m0.118s

openxs@suse:/home2/openxs/dbs/5.0> bin/mysql -uroot test -e "select version(); select count(*) from mytable; show create table mytable\G"
+--------------+
| version()    |
+--------------+
| 5.0.76-debug |
+--------------+
+----------+
| count(*) |
+----------+
|  1048576 |
+----------+
*************************** 1. row ***************************
       Table: mytable
Create Table: CREATE TABLE `mytable` (
  `c1` int(11) NOT NULL auto_increment,
  `c2` int(11) default NULL,
  PRIMARY KEY  (`c1`),
  KEY `c2` (`c2`)
) ENGINE=MyISAM AUTO_INCREMENT=1048577 DEFAULT CHARSET=latin1

openxs@suse:/home2/openxs/dbs/5.0> bin/mysql -uroot test -e "explain select * from mytable order by c2 limit 900000, 150000\G"
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: mytable
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1048576
        Extra: Using filesort

So, I'd say optimizer is right in my case when ALL + filesort is used instead of index access path.
[31 Jan 2009 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".