Bug #28151 | Server scans for MIN() when it could stop at first row found | ||
---|---|---|---|
Submitted: | 29 Apr 2007 0:01 | Modified: | 27 Dec 2009 15:41 |
Reporter: | Baron Schwartz (Basic Quality Contributor) | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S4 (Feature request) |
Version: | 5.1.15-beta | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | INDEX, limit, min, Optimizer, qc, range, scan |
[29 Apr 2007 0:01]
Baron Schwartz
[29 Apr 2007 5:15]
Valeriy Kravchuk
Thank you for a problem report and suggested solution. How optimizer should "know for sure" that all rows have k=1 or that the firts one with k=1 will give you minimal value of j, different column? Without index on k or on (k,j)? Only you know how data really looks like, but I do not understand how optimizer can suggest that.
[29 Apr 2007 14:13]
Baron Schwartz
If the query scans the primary key, as soon as it finds a row where k=1, that row has the minimum value of j, because the primary key is ordered ascending. I should explain the larger context for this query, and show another query that explains why it is important. Suppose I have a table I want to operate on selectively, a row at a time, where k = 1. I want to find a row, do something with it, and then find the next row. The sequence of queries will look like this: select min(j) from test where k = 1; -- returns j = 1 -- do something with this row select min(j) from test where k = 1 and j > 1; -- returns j = 2 -- repeat... select min(j) from test where k = 1 and j > 2; Ignore the first query for a moment, and look at the second and third. The best way to satisfy these queries for the table I describe is to seek into the primary key using the j value, then scan the index until the next k = 1 row is found. This must be the next largest value of j. MySQL will indeed use the index to find the starting point, but then it will scan the rest of the table to find the next row, instead of stopping as soon as it finds one row that satisfies: mysql> explain select min(j) from test where k = 1 and j > 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 32 Extra: Using where 1 row in set (0.00 sec) mysql> show status like 'handler_read%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Handler_read_first | 0 | | Handler_read_key | 2 | | Handler_read_next | 0 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 9 | +-----------------------+-------+ 6 rows in set (0.00 sec) mysql> select min(j) from test where k = 1 and j > 1; +--------+ | min(j) | +--------+ | 2 | +--------+ 1 row in set (0.01 sec) mysql> show status like 'handler_read%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Handler_read_first | 0 | | Handler_read_key | 4 | | Handler_read_next | 63 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 9 | +-----------------------+-------+ 6 rows in set (0.00 sec) I'm sorry I left out this important additional information. I got too focused on the simpler case :-( I hope this helps. I know this is a very particular case and maybe this strategy would be really stupid in many cases. Maybe the DBA should be using the workaround I suggested instead, but it seems to me that a general heuristic of "when scanning an ascending index looking for MIN(), the first row found has to be the MIN" could be used in other query plans as well. If there is an index on k, I don't know how the plan should be designed, but if that heuristic is implemented, maybe it will make a difference.
[29 Apr 2007 18:02]
Valeriy Kravchuk
What you ask for really looks like a reasonable feature request for the optimizer.
[26 Nov 2009 15:08]
Georgi Kodinov
I've tested with the latest 5.1-bugteam. Looks like this feature request have been implemented. From the docs : http://dev.mysql.com/doc/refman/5.1/en/using-explain.html: "Select tables optimized away The query contained only aggregate functions (MIN(), MAX()) that were all resolved using an index, or COUNT(*) for MyISAM, and no GROUP BY clause. The optimizer determined that only one row should be returned." Here's my test output: create table t1 (a int, b int, primary key (a)) engine=innodb; insert into t1 values (1,1),(2,2),(3,3),(4,4); insert into t1 select a + 4, b + 4 from t1; insert into t1 select a + 8, b + 8 from t1; insert into t1 select a + 16, b + 16 from t1; insert into t1 select a + 32, b + 32 from t1; FLUSH STATUS; SELECT MIN(a) FROM t1; MIN(a) 1 show status like 'handler_read%'; Variable_name Value Handler_read_first 1 Handler_read_key 2 Handler_read_next 0 Handler_read_prev 0 Handler_read_rnd 0 Handler_read_rnd_next 0 FLUSH STATUS; SELECT MIN(a) FROM t1 WHERE a = 1; MIN(a) 1 show status like 'handler_read%'; Variable_name Value Handler_read_first 0 Handler_read_key 2 Handler_read_next 0 Handler_read_prev 0 Handler_read_rnd 0 Handler_read_rnd_next 0 EXPLAIN SELECT MIN(a) FROM t1; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE NULL NULL NULL NULL NULL NULL NULL Select tables optimized away EXPLAIN SELECT MIN(a) FROM t1 WHERE a = 1; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE NULL NULL NULL NULL NULL NULL NULL Select tables optimized away
[26 Nov 2009 15:22]
Valeriy Kravchuk
Indeed, looks already implemented: 77-52-12-228:5.1 openxs$ bin/mysql -uroot test Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.1.42-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create table t1 (a int, b int, primary key (a)) engine=innodb; Query OK, 0 rows affected (0.07 sec) mysql> insert into t1 values (1,1),(2,2),(3,3),(4,4); Query OK, 4 rows affected (0.00 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> insert into t1 select a + 4, b + 4 from t1; Query OK, 4 rows affected (0.00 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> insert into t1 select a + 8, b + 8 from t1; Query OK, 8 rows affected (0.00 sec) Records: 8 Duplicates: 0 Warnings: 0 mysql> insert into t1 select a + 16, b + 16 from t1; Query OK, 16 rows affected (0.00 sec) Records: 16 Duplicates: 0 Warnings: 0 mysql> insert into t1 select a + 32, b + 32 from t1; Query OK, 32 rows affected (0.01 sec) Records: 32 Duplicates: 0 Warnings: 0 mysql> FLUSH STATUS; Query OK, 0 rows affected (0.00 sec) mysql> SELECT MIN(a) FROM t1; +--------+ | MIN(a) | +--------+ | 1 | +--------+ 1 row in set (0.00 sec) mysql> show status like 'handler_read%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | Handler_read_first | 1 | | Handler_read_key | 2 | | Handler_read_next | 0 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 0 | +-----------------------+-------+ 6 rows in set (0.00 sec) mysql> EXPLAIN SELECT MIN(a) FROM t1; +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ 1 row in set (0.00 sec) mysql> EXPLAIN SELECT MIN(a) FROM t1 WHERE a = 1; +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+ 1 row in set (0.02 sec)
[27 Nov 2009 14:23]
Baron Schwartz
The last couple of comments have talked about a different, simpler optimization where the combination of the WHERE clause and the index enables the index alone to be used. I'm not speaking about this case, I'm speaking about the case where an index scan is required, the server finds the answer, but then continues scanning more rows. I think I need to provide a better test case to illustrate how bad this behavior can be, and how easy it can be to avoid it. I'll try to do that soon.
[27 Nov 2009 15:41]
Valeriy Kravchuk
Any more complicated test cases that still show the problem in 5.1.41+ are welcomed.
[28 Dec 2009 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".