Bug #41871 Select + Index + Limit + offset on large MyIsam tables give very bad performance
Submitted: 5 Jan 2009 15:47 Modified: 2 Oct 2009 7:27
Reporter: Bouvet Arnaud Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S5 (Performance)
Version:5.1.30 OS:Any (MS Windows, Mac OS X)
Assigned to: CPU Architecture:Any
Tags: select MyIsam index offset performance

[5 Jan 2009 15:47] Bouvet Arnaud
Description:
I'm working on an application which retrieves big amount of datas from very large MyISAM tables (several millions of rows).

To do so, I browse datas in a table by using SQL queries like:
select * from my_table where a_condition limit(Offset, MaxNbOfRows);

In all cases, the most offset is big, the most the query takes time. No problem with this.

But performances depend on the condition in the "where" clause and not like they should.
If I use a primary key or an index in the "where" clause, performances of the queries are very bad. A "select" request can take several hours!

If I use a field without index on it in the "where" clause, a "select" query doesn't take more than several minutes.

I verified the "key_buffer_size" to be sure that index file fits in memory of the MySQL server. So if I use a query with index and no offset, performances are better than a query without index. Normal.

I've performed tests in MySQL server with versions 5.1.30 and 4.1.9 with the same results.

I've performed tests on partitionned table and in this case also use of index and offset kills performances.    

How to repeat:
I'm connected on a database which contains only one table using MyISAM engine. This table isn't partitionned and contains 20 000 000 rows:

mysql> show create table cdr;
-------------------------------------+
| Table | Create Table
-------------------------------------+
| cdr   | CREATE TABLE `cdr` (
  `autonum` int(10) unsigned NOT NULL DEFAULT '0',
  `cnx` int(10) unsigned DEFAULT NULL,
  `pdp` int(10) unsigned DEFAULT NULL,
  `imsi` bigint(20) DEFAULT NULL,
  `imei` bigint(20) DEFAULT NULL,
  `cell` smallint(5) unsigned DEFAULT NULL,
  `cdr_hour` tinyint(3) unsigned NOT NULL,
  PRIMARY KEY (`autonum`),
  KEY `idx1` (`cnx`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=40000000 AVG_ROW_LENGTH=1 |
-------------------------------------+
1 row in set (0.00 sec)

I perform a filter on the table which returns 19 999 995 rows and uses index "idx1".

mysql> select count(*) from cdr where cnx<6000000;
+----------+
| count(*) |
+----------+
| 19999995 |
+----------+
1 row in set (14.16 sec)

Index file of the database fits in memory of the server:

mysql> show variables;
+---------------------------------+----------------------------------------------------------------+
| Variable_name                   | Value                                                          |
+---------------------------------+----------------------------------------------------------------+
| storage_engine                  | MyISAM                                                         |
| key_buffer_size                 | 536870912                                                      |

size of index file: cdr.MYI  ->  349416448

Test description:
I want 10 rows from the table "cdr". I apply the filter "where cnx<6000000" and I move result cursor to the half of the results table with "limit 10000000,10"
Normally, the query should be fast because "idx1" is used.

mysql> explain select * from cdr where cnx<6000000 limit 10000000,10;
+----+-------------+-------+-------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | cdr   | range | idx1          | idx1 | 5       | NULL | 19999984 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+----------+-------------+
1 row in set (0.00 sec)

mysql> select * from cdr where cnx<6000000 limit 10000000,10;
+----------+---------+---------+--------------------+-------------------+------+----------+
| autonum  | cnx     | pdp     | imsi               | imei              | cell | cdr_hour |
+----------+---------+---------+--------------------+-------------------+------+----------+
| 10000001 | 3000001 | 1000001 | 415116470500564037 | 48296055218155026 | 6713 |       12 |
... etc ...
| 10000010 | 3000004 | 1000002 |               NULL | 66361894245339768 | 5187 |       12 |
+----------+---------+---------+--------------------+-------------------+------+----------+
10 rows in set (1 min 4.95 sec)

To retrieve 10 rows, MySQl servers takes 64,95 seconds with using index and offset.

I make an other test. This time, I use the same query but index "idx1" isn't use:

mysql>
mysql> explain select * from cdr ignore index (idx1) where cnx<6000000 limit 10000000,10;
+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | cdr   | ALL  | NULL          | NULL | NULL    | NULL | 20000000 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+----------+-------------+
1 row in set (0.00 sec)

mysql> select * from cdr ignore index (idx1) where cnx<6000000 limit 10000000,10;
+----------+---------+---------+--------------------+-------------------+------+----------+
| autonum  | cnx     | pdp     | imsi               | imei              | cell | cdr_hour |
+----------+---------+---------+--------------------+-------------------+------+----------+
| 10000001 | 3000001 | 1000001 | 415116470500564037 | 48296055218155026 | 6713 |       12 |
... etc ...
| 10000010 | 3000004 | 1000002 |               NULL | 66361894245339768 | 5187 |       12 |
+----------+---------+---------+--------------------+-------------------+------+----------+
10 rows in set (1.83 sec)

To retrieve the same 10 rows, MySQl servers takes 1,83 seconds. In comparison with the 64,95 seconds when using index, performances are very better.

To be sure that my index file is correctly managed by the server, I use a query which only uses index "idx1" to retrieve the same 10 rows:

mysql>
mysql> explain select * from cdr where cnx<6000000 and cnx > 3000000 limit 10;
+----+-------------+-------+-------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | cdr   | range | idx1          | idx1 | 5       | NULL | 9065117 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

mysql> select * from cdr where cnx<6000000 and cnx > 3000000 limit 10;
+----------+---------+---------+--------------------+-------------------+------+----------+
| autonum  | cnx     | pdp     | imsi               | imei              | cell | cdr_hour |
+----------+---------+---------+--------------------+-------------------+------+----------+
|  9999999 | 3000001 | 1000001 | 134322296487531690 | 96808198605033877 | 9230 |       11 |
| 10000000 | 3000001 | 1000001 | 400323545740578296 |              NULL | 1332 |       12 |
... etc ...
| 10000008 | 3000003 | 1000001 | 482427987033283773 |              NULL | 3603 |       12 |
+----------+---------+---------+--------------------+-------------------+------+----------+
10 rows in set (0.00 sec)

Conclusion:
Index file isn't corrupted and is well managed by server.

Why does the use of an index with an offet decrease so much performances of a SQL query? 

Suggested fix:
none.
[6 Jan 2009 10:09] Valeriy Kravchuk
Thank you for a problem report. Please, execute the following:

show session status like 'Handler_read%';

before the original query and after it. Then do the same before the query with IGNORE INDEX and after it. Send the results.
[6 Jan 2009 10:39] Bouvet Arnaud
Please find below results of the manipulation:

mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
6 rows in set (0.02 sec)

mysql> select * from cdr where cnx<6000000 limit 10000000,10;
+----------+---------+---------+--------------------+-------------------+------+----------+
| autonum  | cnx     | pdp     | imsi               | imei              | cell | cdr_hour |
+----------+---------+---------+--------------------+-------------------+------+----------+
| 10000001 | 3000001 | 1000001 |               NULL |               NULL | 6713 |       12 |
... etc ...
| 10000010 | 3000004 | 1000002 |               NULL |               NULL | 5187 |       12 |
+----------+---------+---------+--------------------+-------------------+------+----------+
10 rows in set (1 min 19.36 sec)

mysql> show session status like 'Handler_read%';
+-----------------------+----------+
| Variable_name         | Value    |
+-----------------------+----------+
| Handler_read_first    | 0        |
| Handler_read_key      | 1        |
| Handler_read_next     | 10000009 |
| Handler_read_prev     | 0        |
| Handler_read_rnd      | 0        |
| Handler_read_rnd_next | 0        |
+-----------------------+----------+
6 rows in set (0.00 sec)

--> With a the same connection

mysql> show session status like 'Handler_read%';
+-----------------------+----------+
| Variable_name         | Value    |
+-----------------------+----------+
| Handler_read_first    | 0        |
| Handler_read_key      | 1        |
| Handler_read_next     | 10000009 |
| Handler_read_prev     | 0        |
| Handler_read_rnd      | 0        |
| Handler_read_rnd_next | 0        |
+-----------------------+----------+
6 rows in set (0.00 sec)

mysql>  select * from cdr ignore index (idx1) where cnx<6000000 limit 10000000,10;
+----------+---------+---------+--------------------+-------------------+------+----------+
| autonum  | cnx     | pdp     | imsi               | imei              | cell | cdr_hour |
+----------+---------+---------+--------------------+-------------------+------+----------+
| 10000001 | 3000001 | 1000001 |              NULL  |              NULL | 6713 |       12 |
... etc ...
| 10000010 | 3000004 | 1000002 |               NULL |              NULL | 5187 |       12 |
+----------+---------+---------+--------------------+-------------------+------+----------+
10 rows in set (1.61 sec)

mysql> show session status like 'Handler_read%';
+-----------------------+----------+
| Variable_name         | Value    |
+-----------------------+----------+
| Handler_read_first    | 0        |
| Handler_read_key      | 1        |
| Handler_read_next     | 10000009 |
| Handler_read_prev     | 0        |
| Handler_read_rnd      | 0        |
| Handler_read_rnd_next | 10000010 |
+-----------------------+----------+
6 rows in set (0.00 sec)

--> With a new connection

mysql> connect su_perf_oneindex;
Connection id:    354
Current database: su_perf_oneindex

mysql>  show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
6 rows in set (0.00 sec)

mysql> select * from cdr ignore index (idx1) where cnx<6000000 limit 10000000,10;
+----------+---------+---------+--------------------+-------------------+------+----------+
| autonum  | cnx     | pdp     | imsi               | imei              | cell | cdr_hour |
+----------+---------+---------+--------------------+-------------------+------+----------+
| 10000001 | 3000001 | 1000001 |               NULL |              NULL | 6713 |       12 |
... etc ...
| 10000010 | 3000004 | 1000002 |               NULL |              NULL | 5187 |       12 |
+----------+---------+---------+--------------------+-------------------+------+----------+
10 rows in set (1.59 sec)

mysql>
mysql>  show session status like 'Handler_read%';
+-----------------------+----------+
| Variable_name         | Value    |
+-----------------------+----------+
| Handler_read_first    | 0        |
| Handler_read_key      | 0        |
| Handler_read_next     | 0        |
| Handler_read_prev     | 0        |
| Handler_read_rnd      | 0        |
| Handler_read_rnd_next | 10000010 |
+-----------------------+----------+
6 rows in set (0.00 sec)

mysql>
[6 Jan 2009 11:18] Valeriy Kravchuk
Please, send also the results of:

select count(*) from cdr;

But even without them we can see that both statements had to read 1000010 rows. In the first case it was read of index to find the first record that satisfied condition and then scanning next 1000009 index records, with the need to read the entire row from data file (you use SELECT * !) each time, and each time from a new position. So, you had, besides reading indexes from key buffer, to read individual data row in some specific location, 1000010 times.

In the second care you just read data rows one by one, and it happened so that all first 1000010 rows satisfied condition. Sequential read of data file is much faster. Hence the result.

Optimizer just does NOT know that every row it will read sequentially will satisfy condition.

I see no bug here.
[14 Jan 2009 9:24] Bouvet Arnaud
Thanks for your response.

The cdr table contains:
 
mysql> select count(*) from cdr;
+----------+
| count(*) |
+----------+
| 20000000 |
+----------+
1 row in set (0.00 sec)

I understand your explanation. And I can understand that the read of index file and then datas in table takes more time than the read of only datas in table.

But I don't understand a
[14 Jan 2009 9:35] Bouvet Arnaud
Sorry, I've submitted my answer too fast.

I don't understand a such difference for performances if I use or not index with offset.

For example, if I retrieve only one row after offset, index file is only read once. And then MySQL makes a sequential read, no?

See the results of this example:
mysql> select * from cdr where cnx<6000000 limit 10000000,1;
+----------+---------+---------+--------------------+-------------------+------+----------+
| autonum  | cnx     | pdp     | imsi               | imei              | cell | cdr_hour |
+----------+---------+---------+--------------------+-------------------+------+----------+
| 10000001 | 3000001 | 1000001 |                NUll|               NUll| 6713 |       12 |
+----------+---------+---------+--------------------+-------------------+------+----------+
1 row in set (2 min 24.43 sec)

mysql> show session status like 'Handler_read%';
+-----------------------+----------+
| Variable_name         | Value    |
+-----------------------+----------+
| Handler_read_first    | 0        |
| Handler_read_key      | 1        |
| Handler_read_next     | 10000000 |
| Handler_read_prev     | 0        |
| Handler_read_rnd      | 0        |
| Handler_read_rnd_next | 0        |
+-----------------------+----------+
6 rows in set (0.05 sec)

mysql>  select * from cdr ignore index (idx1) where cnx<6000000 limit 10000000,1;
+----------+---------+---------+--------------------+-------------------+------+----------+
| autonum  | cnx     | pdp     | imsi               | imei              | cell | cdr_hour |
+----------+---------+---------+--------------------+-------------------+------+----------+
| 10000001 | 3000001 | 1000001 |                NUll|               NUll| 6713 |       12 |
+----------+---------+---------+--------------------+-------------------+------+----------+
1 row in set (1.70 sec)

mysql> show session status like 'Handler_read%';
+-----------------------+----------+
| Variable_name         | Value    |
+-----------------------+----------+
| Handler_read_first    | 0        |
| Handler_read_key      | 0        |
| Handler_read_next     | 0        |
| Handler_read_prev     | 0        |
| Handler_read_rnd      | 0        |
| Handler_read_rnd_next | 10000001 |
+-----------------------+----------+
6 rows in set (0.00 sec)

How can I configure the MySQL server to decrease this difference?
[28 Sep 2009 19:55] Sveta Smirnova
Thank you for the feedback.

Verified as described.
[2 Oct 2009 7:27] Sveta Smirnova
Steps to repeat.

1. Load dump from hidden comment.
2. Run query select * from cdr where cnx<6000000 limit 10000000,1;
3. Run query select * from cdr ignore index (idx1) where cnx<6000000 limit 10000000,1;
[14 Aug 2018 16:25] Benjamin Tepolt
Good to see all the lively discussion here!  Thank you for posting, bouvet Arnaud.

I just wanted to add that this small problem has been addressed in dozens and dozens of blogs since 2009, and that it is almost a verifiable, cottage industry.  Just look at this stackoverflow question with 100+ upvotes, https://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-qu... , "Why does MYSQL higher LIMIT offset slow the query down?"

And yet, everyone has a solution, which are always identical.  Change: SELECT * FROM .... LIMIT $X, $Y;.  And instead use a subquery: SELECT * FROM ... WHERE id IN (SELECT id FROM ... LIMIT $X, $Y);.

The simplicity of this fix gives me the impression that it ought to be simply built into the engine itself.