Bug #30686 inappropriate LIMIT behavior on partitioned single-table query sorted by PK
Submitted: 29 Aug 2007 9:37 Modified: 3 Jun 2009 13:14
Reporter: Theodore Dubro Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.1.20-beta-community-nt-debug, 5.1.30 OS:Windows ((via TCP/IP))
Assigned to: CPU Architecture:Any
Tags: limit, syntax, temp tables

[29 Aug 2007 9:37] Theodore Dubro
Description:
Syntactically these two SQL statements should be identical:

1> SELECT secondary_index_column, primary_key_column FROM table ORDER BY primary_key_column
2> SELECT secondary_index_column, primary_key_column FROM table ORDER BY primary_key_column LIMIT x [where x = the number of rows in table]

Table has 12 million rows, ~1.2 GB. The problem is, statement #1 creates 6 GB in temporary tables (in Docs and Settings\Temp) and takes 30+ minutes for the first row to fetch, while statement #2 returns rows immediately and doesn't seem to use any temp tables (as does "SELECT primary_key_column, secondary_index_column FROM table", but the external app in question needs the columns in the opposite order). If the LIMIT token accepted expressions or subqueries a workaround would be easy (e.g., "LIMIT (select count(*) from table)"), but for now the only way to select on my table in primary key order (asc or desc) with the primary key column SELECTed second is to plug in a constant representing the # of rows in the table (in my case, LIMIT 12130921 (the # of rows) returns immediately, no LIMIT or LIMIT 12130922 (the # of rows +1) takes 30 mins/6GB). 

How to repeat:
See above hopefully.
[29 Aug 2007 10:36] Theodore Dubro
That's a MyISAM table. (d'oh)
[29 Aug 2007 11:13] Hartmut Holzgraefe
Could you provide EXPLAIN output for both queries?

You may also consider FORCE INDEX instead of LIMIT here ...
[29 Aug 2007 12:52] Theodore Dubro
#1) without LIMIT (or LIMIT > x): Simple, type=ALL, key=NULL, keylen=NULL, rows=12130921, Extra=Using filesort
#2) LIMIT = x: Simple, type=index, key=PRIMARY, rows=12130921, Extra=[blank]

FORCE INDEX (PRIMARY) solved my immediate problem, so I guess this goes under "Optimizer". Thanks for the quick response.
[28 Oct 2007 14:21] Valeriy Kravchuk
Please, try to repeat with a newer version, 5.1.22, and inform about the results.
[4 Nov 2007 6:21] Theodore Dubro
Sorry, had to fix references to the deprecated mysqld-nt bin. Same table-scan behavior in .22 without FORCE INDEX (all rows, "Using filesort"), but LIMIT n does the same thing now instead of grabbing the primary key, so it's consistent at least.
[14 Feb 2008 4:31] Valeriy Kravchuk
Please, check with a newer version, 5.1.23-rc. See also bug #28404 that can be related.
[15 Mar 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[19 Mar 2008 17:08] Susanne Ebrecht
We still need to know if you get this issue by using our newest version MySQL 5.1.23.
[19 Apr 2008 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[20 Apr 2008 3:08] Theodore Dubro
Sorry for the delay. I don't think the optimizer's making the right call in  .23-rc, but the resolution to #28404 might explain why ("ORDER BY is used with LIMIT N and the cost of N random accesses is less than the cost of filesort"); now instead of table scanning LIMIT anything-but-N the optimizer makes an (arbitrary?) judgment call, but using the index should always cost "less than the cost of filesort" in this scenario (see the last two queries, 1 hour vs. 1 millisecond):

explain select id, url from p_pix FORCE INDEX(primary) order by url desc limit 16765752
->
key PRIMARY, Extra=<blank>, rows 16765752

explain select id, url from p_pix FORCE INDEX(primary) order by url desc 
->
key PRIMARY, Extra=<blank>, rows 16765752

explain select id, url from p_pix order by url desc limit 16765752
->
key NULL, Extra=Using Filesort, rows 16765752

explain select id, url from p_pix order by url desc
->
key NULL, Extra=Using Filesort, rows 16765752

explain select id, url from p_pix order by url desc limit 388000 (or more)
->
key NULL, Extra=Using Filesort, rows 16765752

explain select id, url from p_pix order by url desc limit 387000
->
key PRIMARY, Extra=<blank>, rows 387000
[4 May 2008 14:56] Valeriy Kravchuk
Please, try to repeat with a newer version, 5.1.24, and inform about the results.
[27 May 2008 16:13] Theodore Dubro
same behavior in 5.1.24, "Using Filesort" until FORCE INDEX(PRIMARY) is specified.
[4 Jul 2008 20:00] Valeriy Kravchuk
Please, check with a newer version, 5.1.25-rc. In case of the same problem, please, send the results of SHOW CREATE TABLE for the table used.
[4 Aug 2008 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[9 Dec 2008 22:17] Theodore Dubro
Same thing in 5.1.30 (and myisam in 6.0.8 alpha); what's interesting is swapping the primary key and unique index ("secondary_index_column" in the example) has no effect on the optimizer's choice, and I still need to FORCE INDEX(Index_n) instead of FORCE INDEX(PRIMARY) to see timely results. What I should have mentioned is the former primary key (now unique index) is either a CHAR or VARCHAR in the 100 byte range, which might have something to do with the ORDER BY wanting to create a temp table instead of grabbing the right key as the FORCE INDEX accomplishes. Output:

the good: explain SELECT id, url, @num := @num + 1 gid from p_pix FORCE INDEX(Index_66), (SELECT @num :=0) d order by url desc
id#1: PRIMARY <derived2> system NULL NULL NULL NULL 1 Extra=[blank]
id#1: PRIMARY p_pix index NULL Index_66 125 NULL 18940234 Extra=Using where
id#2: DERIVED NULL NULL NULL NULL NULL NULL NULL Extra=No tables used

the bad: explain SELECT id, url, @num := @num + 1 gid from p_pix, (SELECT @num :=0) d order by url desc
id#1: PRIMARY <derived2> system NULL NULL NULL NULL 1 Extra=Using filesort <--
id#1: PRIMARY p_pix ALL<-- NULL<-- NULL NULL <-- NULL 18940234 Extra=Using where
id#2: DERIVED NULL NULL NULL NULL NULL NULL NULL Extra=No tables used

(note: with a WHERE clause, the second explain row is still key=NULL, but possible_keys becomes one secondary index covered by the WHERE, but it still goes with "Using filesort" anyway)
[11 Dec 2008 15:33] Sveta Smirnova
Thank you for the feedback.

Please send output of SHOW CREATE TABLE for table used.
[11 Dec 2008 21:33] Theodore Dubro
DROP TABLE IF EXISTS `test`.`p_pix`;
CREATE TABLE  `test`.`p_pix` (
  `url` char(125) NOT NULL DEFAULT '0',
  `timestamp` int(10) unsigned NOT NULL DEFAULT '0',
  `status` char(1) NOT NULL DEFAULT '?',
  `type` bit(1) NOT NULL DEFAULT '\0',
  `originid` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `checksum` char(22) DEFAULT NULL,
  `size` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE KEY `Index_66` (`url`) USING BTREE,
  KEY `Index_72` (`originid`),
  KEY `Index_75` (`checksum`,`size`),
  KEY `Index_8` (`timestamp`,`status`) USING BTREE,
  KEY `Index_74` (`status`,`timestamp`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=18952854 DEFAULT CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=FIXED;

(note: same behavior with dynamic or fixed rows (and CHAR or VARCHAR); partitioning irrelevant (besides the inappropriately titled bug report), and it doesn't seem to matter whether `id` (or `url`) is a primary, unique or non-unique index)
[3 May 2009 13:14] Valeriy Kravchuk
Please, specify what exactly is considered a bug here. 

I can confirm your findings with the recent (non-partitioned) p_pix table: for ORDER BY without LIMIT index on url column is NOT used:

mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.1.34-debug | 
+--------------+
1 row in set (0.00 sec)

mysql> CREATE TABLE  `test`.`p_pix` (
    ->   `url` char(125) NOT NULL DEFAULT '0',
    ->   `timestamp` int(10) unsigned NOT NULL DEFAULT '0',
    ->   `status` char(1) NOT NULL DEFAULT '?',
    ->   `type` bit(1) NOT NULL DEFAULT '\0',
    ->   `originid` mediumint(8) unsigned NOT NULL DEFAULT '0',
    ->   `checksum` char(22) DEFAULT NULL,
    ->   `size` mediumint(8) unsigned NOT NULL DEFAULT '0',
    ->   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    ->   PRIMARY KEY (`id`) USING BTREE,
    ->   UNIQUE KEY `Index_66` (`url`) USING BTREE,
    ->   KEY `Index_72` (`originid`),
    ->   KEY `Index_75` (`checksum`,`size`),
    ->   KEY `Index_8` (`timestamp`,`status`) USING BTREE,
    ->   KEY `Index_74` (`status`,`timestamp`) USING BTREE
    -> ) ENGINE=MyISAM AUTO_INCREMENT=18952854 DEFAULT CHARSET=latin1 PACK_KEYS=1
    -> ROW_FORMAT=FIXED;
Query OK, 0 rows affected (0.06 sec)

mysql> 
mysql> insert into p_pix(url) values ('http://mysql.com';);
Query OK, 1 row affected (0.19 sec)

mysql> insert into p_pix(url) values ('http://oracle.com';);
Query OK, 1 row affected (0.00 sec)

mysql> insert into p_pix(url) select (concat('http://';, concat(rand()*1000000,'oracle.com'))) from p_pix;
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into p_pix(url) select (concat('http://';, concat(rand()*1000000,'oracle.com'))) from p_pix;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

...

mysql> insert into p_pix(url) select (concat('http://';, concat(rand()*1000000,'oracle.com'))) from p_pix;
ERROR 1062 (23000): Duplicate entry 'http://855265.308968039oracle.com'; for key 'Index_66'
mysql> select count(*) from p_pix;
+----------+
| count(*) |
+----------+
|     3819 | 
+----------+
1 row in set (0.00 sec)

mysql> explain SELECT id, url, @num := @num + 1 gid from p_pix, (SELECT @num :=0) d order by url desc\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: system
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: p_pix
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3819
        Extra: 
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: No tables used
3 rows in set (0.00 sec)

Indeed, index can be used:

mysql> explain SELECT id, url, @num := @num + 1 gid from p_pix force index(Index_66), (SELECT @num :=0) d order by url desc\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: system
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: p_pix
         type: index
possible_keys: NULL
          key: Index_66
      key_len: 125
          ref: NULL
         rows: 3819
        Extra: 
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: No tables used
3 rows in set (0.00 sec)

Now, we can see the same with simplified query as well:

mysql> explain SELECT id, url from p_pix order by url desc\G*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p_pix
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3819
        Extra: Using filesort
1 row in set (0.00 sec)

But with LIMIT N where N is much less than total number of rows in the table index is used:

mysql> explain SELECT id, url from p_pix order by url desc LIMIT 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p_pix
         type: index
possible_keys: NULL
          key: Index_66
      key_len: 125
          ref: NULL
         rows: 10
        Extra: 
1 row in set (0.00 sec)

and, coming back to the original claim, when LIMIT <total_number_of_rows> is used we have the same plan as without LIMIT:

mysql> explain SELECT id, url from p_pix order by url desc LIMIT 3819\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: p_pix
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 3819
        Extra: Using filesort
1 row in set (0.00 sec)

So, what exactly is a bug now?
[3 Jun 2009 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".