Bug #109169 Index merge should favour union over sort_union
Submitted: 22 Nov 2022 19:21 Modified: 18 Apr 2023 9:40
Reporter: Manuel Ung Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0.31 OS:Any
Assigned to: CPU Architecture:Any

[22 Nov 2022 19:21] Manuel Ung
Description:
Index merge should favour union over sort_union plans, if they have similar costs, since one requires sorting the rowids whereas the other does not.

How to repeat:
create table t (i int, j int, k int, key(i), key(j, k), key(j));
insert into t (WITH RECURSIVE a (i) AS ( SELECT 0  union all SELECT i+1  from a where i < 9  ), b (i) AS  (SELECT x.i + y.i * 10 + z.i * 100 + w.i * 1000 +  v.i * 10000 FROM a x, a y, a z, a w, a v) SELECT b.i, b.i, b.i from b order by i);
explain select * from t where i = 0 or j = 2;
drop table t;

create table t (i int, j int, k int, key(i), key(j), key(j, k));
insert into t (WITH RECURSIVE a (i) AS ( SELECT 0  union all SELECT i+1  from a where i < 9  ), b (i) AS  (SELECT x.i + y.i * 10 + z.i * 100 + w.i * 1000 +  v.i * 10000 FROM a x, a y, a z, a w, a v) SELECT b.i, b.i, b.i from b order by i);
explain select * from t where i = 0 or j = 2;
drop table t;

You can see that just by changing the order of the definition of indexes, you will get the sort_union vs union. Theoretically, sort_union is most expensive as it requires sorting, so it should be disfavoured. 

Suggested fix:
Consider rowid_ordered=true when comparing costs in get_key_scans_params
[23 Nov 2022 4:43] MySQL Verification Team
Hello Manuel Ung,

Thank you for the report and feedback.

regards,
Umesh
[18 Apr 2023 9:40] Jon Stephens
Documented fix as follows in the MySQL 8.0.34 and 8.1.0 changelogs:

    Index Merge (see "Index Merge Optimization" - 
    http://dev.mysql.com/doc/refman/8.0/en/index-merge-optimization.html)
    should favor ROR-union plans (that is, using RowID Ordered
    Retrieval) over sort-union plans if they have similar costs,
    since sort-union requires additionally sorting of the rows by
    row ID whereas ROR-union does not.

    For each part of a WHERE clause containing an OR condition, the
    range optimizer gets the best range scan possible and uses all
    these range scans to build an index merge scan (that is, a
    sort-union scan). If it finds that all the best range scans are
    also ROR-scans, the range optimizer always proposes a ROR-union
    scan because it is always cheaper than a sort-union scan.
    Problems arose when the best range scan for any one part of an
    OR condition is not a ROR-scan, in which case, the range
    optimizer always chose sort-union. This was true even in cases,
    where it might be advantageous to choose a ROR-scan (even though
    it might not be the best range scan to handle one part of the OR
    condition), since this would eliminate any need to sort the rows
    by row ID.

    Now, in such cases, when determining the best range scan, the
    range optimizer also detects whether there is any possible
    ROR-scan, and uses this information to see whether each part of
    the OR condition has at least one possible ROR-scan. If so, we
    rerun the range optimizer to obtain the best ROR-scan for
    handling each part of the OR condition, and to make a ROR-union
    path. We then compare this cost with the cost of a sort-union
    when proposing the final plan.

Closed.