MySQL Bugs: #28151: Server scans for MIN() when it could stop at first row found

Bug #28151	Server scans for MIN() when it could stop at first row found
Submitted:	29 Apr 2007 0:01	Modified:	27 Dec 2009 15:41
Reporter:	Baron Schwartz (Basic Quality Contributor)	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S4 (Feature request)
Version:	5.1.15-beta	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	INDEX, limit, min, Optimizer, qc, range, scan

Description:
MySQL will scan an entire table or range when it could satisfy certain queries with the first row found for a MIN() or MAX() query.

How to repeat:
The following shows a table with 64 rows.  The first row with k=1 is guaranteed to be the minimum j value because of the primary key, but it scans the whole table.

mysql> CREATE TABLE `test` (
  `j` int(11) NOT NULL,
  `k` int(11) NOT NULL,
  PRIMARY KEY (`j`)
) ENGINE=InnoDB;

mysql> insert into test(j, k) select j + (select max(j) from test), 1 from test;

-- repeat the above query a bunch of times.

mysql> select version();
+-------------+
| version()   |
+-------------+
| 5.1.15-beta | 
+-------------+
1 row in set (0.00 sec)

mysql> select min(j), max(j), min(k), max(k), count(*) from test;
+--------+--------+--------+--------+----------+
| min(j) | max(j) | min(k) | max(k) | count(*) |
+--------+--------+--------+--------+----------+
|      1 |     64 |      1 |      1 |       64 | 
+--------+--------+--------+--------+----------+
1 row in set (0.00 sec)

mysql> explain select min(j) from test where k = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 64
        Extra: Using where
1 row in set (0.00 sec)

mysql> show status like 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 40    | 
| Handler_read_key      | 74    | 
| Handler_read_next     | 4286  | 
| Handler_read_prev     | 0     | 
| Handler_read_rnd      | 100   | 
| Handler_read_rnd_next | 16865 | 
+-----------------------+-------+
6 rows in set (0.00 sec)

mysql> select min(j) from test where k = 1;
+--------+
| min(j) |
+--------+
|      1 | 
+--------+
1 row in set (0.00 sec)

mysql> show status like 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 41    | 
| Handler_read_key      | 76    | 
| Handler_read_next     | 4286  | 
| Handler_read_prev     | 0     | 
| Handler_read_rnd      | 100   | 
| Handler_read_rnd_next | 16930 | 
+-----------------------+-------+
6 rows in set (0.00 sec)

Handler_read_rnd_next shows that all the rows were read:

mysql> select 16930-16865;
+-------------+
| 16930-16865 |
+-------------+
|          65 | 
+-------------+
1 row in set (0.00 sec)

Suggested fix:
In the example given, the query plan should be to begin the scan and quit as soon as a row that satisfies the WHERE is found.

The workaround is to rewrite the query as

select j from test where k = 1 limit 1;

The MIN() must be omitted from the rewritten query for this to work.

Thank you for a problem report and suggested solution. How optimizer should "know for sure" that all rows have k=1 or that the firts one with k=1 will give you minimal value of j, different column? Without index on k or on (k,j)? 

Only you know how data really looks like, but I do not understand how optimizer can suggest that.

If the query scans the primary key, as soon as it finds a row where k=1, that row has the minimum value of j, because the primary key is ordered ascending.

I should explain the larger context for this query, and show another query that explains why it is important.  Suppose I have a table I want to operate on selectively, a row at a time, where k = 1.  I want to find a row, do something with it, and then find the next row.

The sequence of queries will look like this:

select min(j) from test where k = 1;
-- returns j = 1
-- do something with this row
select min(j) from test where k = 1 and j > 1;
-- returns j = 2
-- repeat...
select min(j) from test where k = 1 and j > 2;

Ignore the first query for a moment, and look at the second and third.  The best way to satisfy these queries for the table I describe is to seek into the primary key using the j value, then scan the index until the next k = 1 row is found.  This must be the next largest value of j.

MySQL will indeed use the index to find the starting point, but then it will scan the rest of the table to find the next row, instead of stopping as soon as it finds one row that satisfies:

mysql> explain select min(j) from test where k = 1 and j > 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 32
        Extra: Using where
1 row in set (0.00 sec)

mysql> show status like 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     | 
| Handler_read_key      | 2     | 
| Handler_read_next     | 0     | 
| Handler_read_prev     | 0     | 
| Handler_read_rnd      | 0     | 
| Handler_read_rnd_next | 9     | 
+-----------------------+-------+
6 rows in set (0.00 sec)

mysql> select min(j) from test where k = 1 and j > 1;
+--------+
| min(j) |
+--------+
|      2 | 
+--------+
1 row in set (0.01 sec)

mysql> show status like 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     | 
| Handler_read_key      | 4     | 
| Handler_read_next     | 63    | 
| Handler_read_prev     | 0     | 
| Handler_read_rnd      | 0     | 
| Handler_read_rnd_next | 9     | 
+-----------------------+-------+
6 rows in set (0.00 sec)

I'm sorry I left out this important additional information.  I got too focused on the simpler case :-(  I hope this helps.  I know this is a very particular case and maybe this strategy would be really stupid in many cases.  Maybe the DBA should be using the workaround I suggested instead, but it seems to me that a general heuristic of "when scanning an ascending index looking for MIN(), the first row found has to be the MIN" could be used in other query plans as well.

If there is an index on k, I don't know how the plan should be designed, but if that heuristic is implemented, maybe it will make a difference.

What you ask for really looks like a reasonable feature request for the optimizer.

I've tested with the latest 5.1-bugteam. Looks like this feature request have been implemented.

From the docs : http://dev.mysql.com/doc/refman/5.1/en/using-explain.html:

"Select tables optimized away

The query contained only aggregate functions (MIN(), MAX()) that were all resolved using an index, or COUNT(*) for MyISAM, and no GROUP BY clause. The optimizer determined that only one row should be returned."

Here's my test output:

create table t1 (a int, b int, primary key (a)) engine=innodb;
insert into t1 values (1,1),(2,2),(3,3),(4,4);
insert into t1 select a + 4, b + 4 from t1;
insert into t1 select a + 8, b + 8 from t1;
insert into t1 select a + 16, b + 16 from t1;
insert into t1 select a + 32, b + 32 from t1;
FLUSH STATUS;
SELECT MIN(a) FROM t1;
MIN(a)
1
show status like 'handler_read%';
Variable_name   Value
Handler_read_first      1
Handler_read_key        2
Handler_read_next       0
Handler_read_prev       0
Handler_read_rnd        0
Handler_read_rnd_next   0
FLUSH STATUS;
SELECT MIN(a) FROM t1 WHERE a = 1;
MIN(a)
1
show status like 'handler_read%';
Variable_name   Value
Handler_read_first      0
Handler_read_key        2
Handler_read_next       0
Handler_read_prev       0
Handler_read_rnd        0
Handler_read_rnd_next   0
EXPLAIN SELECT MIN(a) FROM t1;
id      select_type     table   type    possible_keys   key     key_len ref    rows     Extra
1       SIMPLE  NULL    NULL    NULL    NULL    NULL    NULL    NULL    Select tables optimized away
EXPLAIN SELECT MIN(a) FROM t1 WHERE a = 1;
id      select_type     table   type    possible_keys   key     key_len ref    rows     Extra
1       SIMPLE  NULL    NULL    NULL    NULL    NULL    NULL    NULL    Select tables optimized away

Indeed, looks already implemented:

77-52-12-228:5.1 openxs$ bin/mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.42-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create table t1 (a int, b int, primary key (a)) engine=innodb;
Query OK, 0 rows affected (0.07 sec)

mysql> insert into t1 values (1,1),(2,2),(3,3),(4,4);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> insert into t1 select a + 4, b + 4 from t1;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> insert into t1 select a + 8, b + 8 from t1;
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> insert into t1 select a + 16, b + 16 from t1;
Query OK, 16 rows affected (0.00 sec)
Records: 16  Duplicates: 0  Warnings: 0

mysql> insert into t1 select a + 32, b + 32 from t1;
Query OK, 32 rows affected (0.01 sec)
Records: 32  Duplicates: 0  Warnings: 0

mysql> FLUSH STATUS;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT MIN(a) FROM t1;
+--------+
| MIN(a) |
+--------+
|      1 |
+--------+
1 row in set (0.00 sec)

mysql> show status like 'handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 1     |
| Handler_read_key      | 2     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
6 rows in set (0.00 sec)

mysql> EXPLAIN SELECT MIN(a) FROM t1;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT MIN(a) FROM t1 WHERE a = 1;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.02 sec)

The last couple of comments have talked about a different, simpler optimization where the combination of the WHERE clause and the index enables the index alone to be used.  I'm not speaking about this case, I'm speaking about the case where an index scan is required, the server finds the answer, but then continues scanning more rows.

I think I need to provide a better test case to illustrate how bad this behavior can be, and how easy it can be to avoid it.  I'll try to do that soon.

Any more complicated test cases that still show the problem in 5.1.41+ are welcomed.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".