Bug #17842 Statement pruning on UNION SELECT
Submitted: 1 Mar 2006 21:42 Modified: 18 Aug 2007 2:07
Reporter: Beat Vontobel (Silver Quality Contributor) (SCA) Email Updates:
Status: Verified
Category:Server: Optimizer Severity:S4 (Feature request)
Version:5.0.18 OS:Any (any)
Assigned to: Target Version:TBD
Tags: qc
Triage: D5 (Feature request)

[1 Mar 2006 21:42] Beat Vontobel
Description:
The query optimiser isn't currently aware of "impossible" parts of a UNION SELECT: An
inner query could be optimised away completely if it contains a WHERE clause
contradicting another WHERE clause in an outer statement, similar to partition pruning.

This feature request describes the behaviour in subqueries, but the feature would be
especially useful in VIEWs with a UNION once they are accepted as ALGORITHM=MERGE. See
http://www.futhark.ch/mysql/137.html for a detailed explanation of this.

How to repeat:
Create two tables and insert some data:

 CREATE TABLE t_current (
   d date NOT NULL PRIMARY KEY,
   c char(1) NOT NULL
 ) ENGINE=MyISAM;

 CREATE TABLE t_archive (
   d date NOT NULL,
   c char(1) NOT NULL
 ) ENGINE=ARCHIVE;

 INSERT INTO t_archive (d, c) VALUES ('2005-05-13', 'a');
 INSERT INTO t_current (d, c) VALUES ('2006-03-01', 'c');

Create a table and a function to log the behaviour of the final query:

 CREATE TABLE log (accessed ENUM('t_archive', 't_current'));
 
 delimiter // 
 CREATE FUNCTION log_access(accessed ENUM('t_archive', 't_current'))
   RETURNS ENUM('t_archive', 't_current')
   DETERMINISTIC
 BEGIN
   INSERT INTO log VALUES (accessed);
   RETURN accessed;
 END //
 delimiter ;

Execute a UNION SELECT that can only return rows from one of the two tables:

 SELECT * FROM (
    SELECT    d, c, log_access('t_current') AS accessed
    FROM      t_current
    WHERE     d >= '2006-01-01'
  UNION
    SELECT    d, c, log_access('t_archive') AS accessed
    FROM      t_archive
    WHERE     d < '2006-01-01'
 ) AS s WHERE d > '2006-01-01';

 +------------+---+-----------+
 | d          | c | accessed  |
 +------------+---+-----------+
 | 2006-03-01 | c | t_current |
 +------------+---+-----------+
 1 row in set (0.03 sec)

The log shows that still both of the tables have been accessed:

 SELECT * FROM log;

 +-----------+
 | accessed  |
 +-----------+
 | t_current |
 | t_archive |
 +-----------+
 2 rows in set (0.00 sec)

Suggested fix:
The query optimiser should eliminate parts of a UNION statement that can't contribute any
rows to the final result set due to an impossible combination of inner and outer WHERE
restrictions.
[5 Mar 2006 9:14] Valeriy Kravchuk
Thank you for a reasonable feature request. It would be nice for optimizer to check such a
situations, at least, in obvious cases. Or just to perform outer conditions merge into
UNIONed SELECTs before further optimization steps.
[18 Aug 2007 2:07] Igor Babaev
- This is a request for an optimization implementation of which will take at least 2
man-month.
- Yet the optimization does not look widely applied.

By the above reasons I move the bug to 'To be fixed later'.
Product management will decide in what version a fix for this problem appears.