Description:
Consider the following table:
create table abc(a int, b int, c int, d int, key(a,b,c));
The following queries are executed efficiently on it:
mysql> explain SELECT DISTINCT a FROM abc;
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | abc | index | NULL | a | 15 | NULL | 5 | Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> explain SELECT DISTINCT a FROM abc LIMIT 1;
+----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------+
| 1 | SIMPLE | abc | index | NULL | a | 15 | NULL | 5 | Using index; Using temporary |
+----+-------------+-------+-------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.01 sec)
While this one is equally inefficient:
mysql> explain SELECT * FROM ( SELECT DISTINCT a FROM abc ) x LIMIT 1;
+----+-------------+------------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+------+---------+------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | |
| 2 | DERIVED | abc | index | NULL | a | 15 | NULL | 5 | Using index |
+----+-------------+------------+-------+---------------+------+---------+------+------+-------------+
2 rows in set (0.00 sec)
with LIMIT 1 or without. Optimizer fully materializes subquery first, and applies LIMIT after that.
How to repeat:
macbook-pro:5.5 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.5.20-debug-log Source distribution
Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> create table abc(a int, b int, c int, d int, key(a,b,c));
Query OK, 0 rows affected (0.56 sec)
mysql> insert into abc values (1,1,1,1), (1,2,2,2), (1,3,3,3), (2,1,1,1), (2,2,2,2);
Query OK, 5 rows affected (0.39 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 1 |
| Handler_read_last | 0 |
| Handler_read_next | 0 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 36 |
+-----------------------+-------+
7 rows in set (0.36 sec)
mysql> SELECT DISTINCT a FROM abc LIMIT 1;
+------+
| a |
+------+
| 1 |
+------+
1 row in set (0.05 sec)
mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 1 |
| Handler_read_key | 3 |
| Handler_read_last | 0 |
| Handler_read_next | 0 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 37 |
+-----------------------+-------+
7 rows in set (0.00 sec)
Now compare to the following:
mysql> SELECT * FROM ( SELECT DISTINCT a FROM abc ) x LIMIT 1;
+------+
| a |
+------+
| 1 |
+------+
1 row in set (0.02 sec)
mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 2 |
| Handler_read_key | 5 |
| Handler_read_last | 0 |
| Handler_read_next | 5 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 38 |
+-----------------------+-------+
7 rows in set (0.00 sec)
and note 5 rows read from index, so entire index was scanned.
Suggested fix:
Either consider non-materializing approach (query rewrite?) based on loose scan for this and similar type of subqueries with LIMIT, or at least try to shortcut materialization at early stage, trying to check conditions/limits in outer query after getting every row from subquery in FROM clause.
See WL#3485 and WL#3341 - maybe one of them can be extended to cover this case...
See bug #24156 that had to solve this problem for some kinds of subqueries in FROM (at least when LIMIT is not used). among other things.
See also Bug #61517, for another case when loose index scan is not used when LIMIT is added.