| Bug #28151 | Server scans for MIN() when it could stop at first row found | ||
|---|---|---|---|
| Submitted: | 29 Apr 2007 0:01 | Modified: | 27 Dec 2009 15:41 |
| Reporter: | Baron Schwartz (Basic Quality Contributor) | Email Updates: | |
| Status: | No Feedback | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S4 (Feature request) |
| Version: | 5.1.15-beta | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | INDEX, limit, min, Optimizer, qc, range, scan | ||
[29 Apr 2007 5:15]
Valeriy Kravchuk
Thank you for a problem report and suggested solution. How optimizer should "know for sure" that all rows have k=1 or that the firts one with k=1 will give you minimal value of j, different column? Without index on k or on (k,j)? Only you know how data really looks like, but I do not understand how optimizer can suggest that.
[29 Apr 2007 14:13]
Baron Schwartz
If the query scans the primary key, as soon as it finds a row where k=1, that row has the minimum value of j, because the primary key is ordered ascending.
I should explain the larger context for this query, and show another query that explains why it is important. Suppose I have a table I want to operate on selectively, a row at a time, where k = 1. I want to find a row, do something with it, and then find the next row.
The sequence of queries will look like this:
select min(j) from test where k = 1;
-- returns j = 1
-- do something with this row
select min(j) from test where k = 1 and j > 1;
-- returns j = 2
-- repeat...
select min(j) from test where k = 1 and j > 2;
Ignore the first query for a moment, and look at the second and third. The best way to satisfy these queries for the table I describe is to seek into the primary key using the j value, then scan the index until the next k = 1 row is found. This must be the next largest value of j.
MySQL will indeed use the index to find the starting point, but then it will scan the rest of the table to find the next row, instead of stopping as soon as it finds one row that satisfies:
mysql> explain select min(j) from test where k = 1 and j > 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 32
Extra: Using where
1 row in set (0.00 sec)
mysql> show status like 'handler_read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 2 |
| Handler_read_next | 0 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 9 |
+-----------------------+-------+
6 rows in set (0.00 sec)
mysql> select min(j) from test where k = 1 and j > 1;
+--------+
| min(j) |
+--------+
| 2 |
+--------+
1 row in set (0.01 sec)
mysql> show status like 'handler_read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 4 |
| Handler_read_next | 63 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 9 |
+-----------------------+-------+
6 rows in set (0.00 sec)
I'm sorry I left out this important additional information. I got too focused on the simpler case :-( I hope this helps. I know this is a very particular case and maybe this strategy would be really stupid in many cases. Maybe the DBA should be using the workaround I suggested instead, but it seems to me that a general heuristic of "when scanning an ascending index looking for MIN(), the first row found has to be the MIN" could be used in other query plans as well.
If there is an index on k, I don't know how the plan should be designed, but if that heuristic is implemented, maybe it will make a difference.
[29 Apr 2007 18:02]
Valeriy Kravchuk
What you ask for really looks like a reasonable feature request for the optimizer.
[26 Nov 2009 15:08]
Georgi Kodinov
I've tested with the latest 5.1-bugteam. Looks like this feature request have been implemented. From the docs : http://dev.mysql.com/doc/refman/5.1/en/using-explain.html: "Select tables optimized away The query contained only aggregate functions (MIN(), MAX()) that were all resolved using an index, or COUNT(*) for MyISAM, and no GROUP BY clause. The optimizer determined that only one row should be returned." Here's my test output: create table t1 (a int, b int, primary key (a)) engine=innodb; insert into t1 values (1,1),(2,2),(3,3),(4,4); insert into t1 select a + 4, b + 4 from t1; insert into t1 select a + 8, b + 8 from t1; insert into t1 select a + 16, b + 16 from t1; insert into t1 select a + 32, b + 32 from t1; FLUSH STATUS; SELECT MIN(a) FROM t1; MIN(a) 1 show status like 'handler_read%'; Variable_name Value Handler_read_first 1 Handler_read_key 2 Handler_read_next 0 Handler_read_prev 0 Handler_read_rnd 0 Handler_read_rnd_next 0 FLUSH STATUS; SELECT MIN(a) FROM t1 WHERE a = 1; MIN(a) 1 show status like 'handler_read%'; Variable_name Value Handler_read_first 0 Handler_read_key 2 Handler_read_next 0 Handler_read_prev 0 Handler_read_rnd 0 Handler_read_rnd_next 0 EXPLAIN SELECT MIN(a) FROM t1; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE NULL NULL NULL NULL NULL NULL NULL Select tables optimized away EXPLAIN SELECT MIN(a) FROM t1 WHERE a = 1; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE NULL NULL NULL NULL NULL NULL NULL Select tables optimized away
[26 Nov 2009 15:22]
Valeriy Kravchuk
Indeed, looks already implemented: 77-52-12-228:5.1 openxs$ bin/mysql -uroot test Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.1.42-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create table t1 (a int, b int, primary key (a)) engine=innodb; Query OK, 0 rows affected (0.07 sec) mysql> insert into t1 values (1,1),(2,2),(3,3),(4,4); Query OK, 4 rows affected (0.00 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> insert into t1 select a + 4, b + 4 from t1; Query OK, 4 rows affected (0.00 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> insert into t1 select a + 8, b + 8 from t1; Query OK, 8 rows affected (0.00 sec) Records: 8 Duplicates: 0 Warnings: 0 mysql> insert into t1 select a + 16, b + 16 from t1; Query OK, 16 rows affected (0.00 sec) Records: 16 Duplicates: 0 Warnings: 0 mysql> insert into t1 select a + 32, b + 32 from t1; Query OK, 32 rows affected (0.01 sec) Records: 32 Duplicates: 0 Warnings: 0 mysql> FLUSH STATUS; Query OK, 0 rows affected (0.00 sec) mysql> SELECT MIN(a) FROM t1; +--------+ | MIN(a) | +--------+ | 1 | +--------+ 1 row in set (0.00 sec) mysql> show status like 'handler_read%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Handler_read_first | 1 | | Handler_read_key | 2 | | Handler_read_next | 0 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 0 | +-----------------------+-------+ 6 rows in set (0.00 sec) mysql> EXPLAIN SELECT MIN(a) FROM t1; +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT MIN(a) FROM t1 WHERE a = 1; +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ 1 row in set (0.02 sec)
[27 Nov 2009 14:23]
Baron Schwartz
The last couple of comments have talked about a different, simpler optimization where the combination of the WHERE clause and the index enables the index alone to be used. I'm not speaking about this case, I'm speaking about the case where an index scan is required, the server finds the answer, but then continues scanning more rows. I think I need to provide a better test case to illustrate how bad this behavior can be, and how easy it can be to avoid it. I'll try to do that soon.
[27 Nov 2009 15:41]
Valeriy Kravchuk
Any more complicated test cases that still show the problem in 5.1.41+ are welcomed.
[28 Dec 2009 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".

Description: MySQL will scan an entire table or range when it could satisfy certain queries with the first row found for a MIN() or MAX() query. How to repeat: The following shows a table with 64 rows. The first row with k=1 is guaranteed to be the minimum j value because of the primary key, but it scans the whole table. mysql> CREATE TABLE `test` ( `j` int(11) NOT NULL, `k` int(11) NOT NULL, PRIMARY KEY (`j`) ) ENGINE=InnoDB; mysql> insert into test(j, k) select j + (select max(j) from test), 1 from test; -- repeat the above query a bunch of times. mysql> select version(); +-------------+ | version() | +-------------+ | 5.1.15-beta | +-------------+ 1 row in set (0.00 sec) mysql> select min(j), max(j), min(k), max(k), count(*) from test; +--------+--------+--------+--------+----------+ | min(j) | max(j) | min(k) | max(k) | count(*) | +--------+--------+--------+--------+----------+ | 1 | 64 | 1 | 1 | 64 | +--------+--------+--------+--------+----------+ 1 row in set (0.00 sec) mysql> explain select min(j) from test where k = 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 64 Extra: Using where 1 row in set (0.00 sec) mysql> show status like 'handler_read%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Handler_read_first | 40 | | Handler_read_key | 74 | | Handler_read_next | 4286 | | Handler_read_prev | 0 | | Handler_read_rnd | 100 | | Handler_read_rnd_next | 16865 | +-----------------------+-------+ 6 rows in set (0.00 sec) mysql> select min(j) from test where k = 1; +--------+ | min(j) | +--------+ | 1 | +--------+ 1 row in set (0.00 sec) mysql> show status like 'handler_read%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Handler_read_first | 41 | | Handler_read_key | 76 | | Handler_read_next | 4286 | | Handler_read_prev | 0 | | Handler_read_rnd | 100 | | Handler_read_rnd_next | 16930 | +-----------------------+-------+ 6 rows in set (0.00 sec) Handler_read_rnd_next shows that all the rows were read: mysql> select 16930-16865; +-------------+ | 16930-16865 | +-------------+ | 65 | +-------------+ 1 row in set (0.00 sec) Suggested fix: In the example given, the query plan should be to begin the scan and quit as soon as a row that satisfies the WHERE is found. The workaround is to rewrite the query as select j from test where k = 1 limit 1; The MIN() must be omitted from the rewritten query for this to work.