Bug #92994 please evaluate unions lazily
Submitted: 29 Oct 2018 18:17 Modified: 11 Oct 2019 12:13
Reporter: Domas Mituzas Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:* OS:Any
Assigned to: CPU Architecture:Any

[29 Oct 2018 18:17] Domas Mituzas
Description:
( SELECT ... LIMIT 10000)
UNION ALL
( SELECT ... LIMIT 10000)
LIMIT 1000

will materialize both unions fully, even if first one fully satisfies outer LIMIT

please use lazier execution for it. 

How to repeat:
mysql> explain ( select *, 'a' from t1 limit 1000 ) UNION ALL ( select *, 'b' from t1 limit 10000 ) limit 10
    -> ;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------+
|  1 | PRIMARY     | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 522614 |   100.00 | NULL  |
|  2 | UNION       | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 522614 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------+
2 rows in set, 1 warning (0.00 sec)

mysql> ( select *, 'a' from t1 limit 1000 ) UNION ALL ( select *, 'b' from t1 limit 10000 ) limit 10
    -> ;
+----+------+------+------+---+
| a  | b    | c    | d    | a |
+----+------+------+------+---+
|  1 |    0 |    1 | NULL | a |
|  2 |    0 |    2 | NULL | a |
|  3 |    0 |    3 | NULL | a |
|  4 |    0 |    4 | NULL | a |
|  6 |    0 |    6 | NULL | a |
|  7 |    0 |    7 | NULL | a |
|  8 |    0 |    8 | NULL | a |
|  9 |    0 |    9 | NULL | a |
| 13 |    0 |   13 | NULL | a |
| 14 |    0 |   14 | NULL | a |
+----+------+------+------+---+
10 rows in set (0.01 sec)

mysql> show status like '%handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 2     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 8     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 2     |
| Handler_read_key           | 2     |
| Handler_read_last          | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 11000 |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
18 rows in set (0.01 sec)

Suggested fix:
evaluate unions lazily.
[29 Oct 2018 18:18] Domas Mituzas
this is a followup to https://bugs.mysql.com/bug.php?id=79040 too
[30 Oct 2018 9:03] Guilhem Bichot
Duplicate of https://bugs.mysql.com/bug.php?id=79340
"UNION ALL ... LIMIT 1 evaluates too many rows in 5.5/6 and twice more in 5.7"
?
[30 Oct 2018 13:55] MySQL Verification Team
Hi Domas,

Actually, this was the optimisation that we discussed back in Prague. I think, back in 2006, but it was never implemented.

This is a fully valid feature request and I am verifying it as such.
[11 Oct 2019 12:13] Guilhem Bichot
BUG#79340 is another instance of this.
It was fixed in 8.0.18 with changelog:
    Queries using UNION ALL ... LIMIT 1 evaluated an excessive
    number of rows as shown by Handler_read, Handler_read_key, and
    Handler_read_next.

Verification that it handles Domas' testcase:

explain analyze ( select *, 'a' from t1 limit 1000 ) UNION ALL ( select *, 'b' from t1 limit 10000 ) limit 10;
--------------
| -> Limit: 10 row(s)  (actual time=0.130..0.513 rows=10 loops=1)
    -> Append  (actual time=0.129..0.511 rows=10 loops=1)
        -> Stream results  (actual time=0.128..0.509 rows=10 loops=1)
            -> Limit: 1000 row(s)  (actual time=0.114..0.389 rows=10 loops=1)
                -> Table scan on t1  (cost=206.30 rows=2048) (actual time=0.113..0.387 rows=10 loops=1)
        -> Stream results  (never executed)
            -> Limit: 10000 row(s)  (never executed)
                -> Table scan on t1  (cost=206.30 rows=2048) (never executed)

I have 2048 rows in t1; the actual execution stats above show "rows=10" for tje scan of t1: we read only 10 rows of t1.
Confirmed by:
mysql> show status like "handler%";                                                                                                                                                                                | Handler_read_first         | 1     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 10    |
...
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
[11 Oct 2019 12:30] MySQL Verification Team
Merci beaucoup, Guilhem !!!!!