Bug #96112 "SELECT x WHERE y ORDER BY x LIMIT 1" faster than "SELECT MIN(x) WHERE y"
Submitted: 6 Jul 2019 14:02 Modified: 8 Jul 2019 12:43
Reporter: Ruud H.G. van Tol Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:5.7 OS:CentOS
Assigned to: CPU Architecture:x86

[6 Jul 2019 14:02] Ruud H.G. van Tol
Description:
"SELECT x WHERE y ORDER BY x LIMIT 1" 
can easily be faster than 
"SELECT MIN(x) WHERE y".

For example when 'x' is the PK 'id' column, and 'y' is an 'IS NOT NULL' condition on a non-indexed column.

How to repeat:
Try on any significantly sized table, that has a simple numeric 'id' column as PK.

It seems as if MIN(id) then isn't aware of the ever increasing nature of 'id'.

Suggested fix:

A query optimizer/rewriter could probably manage this.
[6 Jul 2019 16:00] MySQL Verification Team
Here's a testcase that I think represents the report
------------

drop table if exists t;
create table t(x int unsigned not null auto_increment primary key,y int)engine=innodb;
insert into t(y) values(if(rand()<0.1,null,floor(rand()*10000)));
insert into t(y) values(if(rand()<0.1,null,floor(rand()*10000)));
insert into t(y) values(if(rand()<0.1,null,floor(rand()*10000)));
insert into t(y) values(if(rand()<0.1,null,floor(rand()*10000)));
insert into t(y) values(if(rand()<0.1,null,floor(rand()*10000)));
insert into t(y) select if(rand()<0.1,null,floor(rand()*10000)) from t a,t b,t c;
insert into t(y) select if(rand()<0.1,null,floor(rand()*10000)) from t a,t b,t c;
select count(*) from t;
analyze table t;
explain select x from t where y is not null order by x limit 1;
explain select min(x) from t where y is not null;

flush status;
select x from t where y is not null order by x limit 1;
show status like '%handler%';

flush status;
select min(x) from t where y is not null;
show status like '%handler%';

------------------------
[6 Jul 2019 16:01] MySQL Verification Team
mysql> select count(*) from t;
+----------+
| count(*) |
+----------+
|  2197130 |
+----------+
1 row in set (0.94 sec)

mysql> analyze table t;
+--------+---------+----------+----------+
| Table  | Op      | Msg_type | Msg_text |
+--------+---------+----------+----------+
| test.t | analyze | status   | OK       |
+--------+---------+----------+----------+
1 row in set (0.09 sec)

mysql> explain select x from t where y is not null order by x limit 1;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t     | NULL       | index | NULL          | PRIMARY | 4       | NULL |    1 |    90.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select min(x) from t where y is not null;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | t     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2191475 |    90.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql>
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select x from t where y is not null order by x limit 1;
+---+
| x |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

mysql> show status like '%handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 2     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 1     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
18 rows in set (0.00 sec)

mysql>
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> select min(x) from t where y is not null;
+--------+
| min(x) |
+--------+
|      1 |
+--------+
1 row in set (1.96 sec)

mysql> show status like '%handler%';
+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_commit             | 1       |
| Handler_delete             | 0       |
| Handler_discover           | 0       |
| Handler_external_lock      | 2       |
| Handler_mrr_init           | 0       |
| Handler_prepare            | 0       |
| Handler_read_first         | 1       |
| Handler_read_key           | 1       |
| Handler_read_last          | 0       |
| Handler_read_next          | 0       |
| Handler_read_prev          | 0       |
| Handler_read_rnd           | 0       |
| Handler_read_rnd_next      | 2197131 |
| Handler_rollback           | 0       |
| Handler_savepoint          | 0       |
| Handler_savepoint_rollback | 0       |
| Handler_update             | 0       |
| Handler_write              | 0       |
+----------------------------+---------+
18 rows in set (0.00 sec)
[7 Jul 2019 9:50] Ruud H.G. van Tol
For MAX(id) you could build a similar case:

SELECT id FROM t WHERE y IS NOT NULL ORDER BY id /***/ DESC /***/;

but if y is a column that went out of use a few million rows (AKA several months) ago, that of course still takes a long time.

BTW, this makes me think of a new feature request: a way to keep the MIN(<PK>) and MAX(<PK>) of any specified condition (like "y IS NOT NULL"), as that takes up minimal space, can already be built with triggers, could be used by the query optimizer to limit PK-ranges, and it covers MIN/MAX queries.
It would be kind of a mini-index, as it never holds more than 2 PK-values.
Similar for temporal cases, storing (<MIN_timestamp>,<PK>) and (<MAX_timestam p>,<PK>), to persist when a condition first and last occurred.
[8 Jul 2019 12:43] MySQL Verification Team
Hi Mr. Ruud,

Thank you for your feature request.

I agree with you and my colleague Shane that this is a very nice feature request.

Verified as reported.