Bug #17842 Statement pruning on UNION SELECT
Submitted: 1 Mar 2006 21:42 Modified: 18 Aug 2007 2:07
Reporter: Beat Vontobel (Silver Quality Contributor) (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:5.0.18 OS:Any (any)
Assigned to: CPU Architecture:Any
Tags: qc
Triage: D5 (Feature request)

[1 Mar 2006 21:42] Beat Vontobel
Description:
The query optimiser isn't currently aware of "impossible" parts of a UNION SELECT: An inner query could be optimised away completely if it contains a WHERE clause contradicting another WHERE clause in an outer statement, similar to partition pruning.

This feature request describes the behaviour in subqueries, but the feature would be especially useful in VIEWs with a UNION once they are accepted as ALGORITHM=MERGE. See http://www.futhark.ch/mysql/137.html for a detailed explanation of this.

How to repeat:
Create two tables and insert some data:

 CREATE TABLE t_current (
   d date NOT NULL PRIMARY KEY,
   c char(1) NOT NULL
 ) ENGINE=MyISAM;

 CREATE TABLE t_archive (
   d date NOT NULL,
   c char(1) NOT NULL
 ) ENGINE=ARCHIVE;

 INSERT INTO t_archive (d, c) VALUES ('2005-05-13', 'a');
 INSERT INTO t_current (d, c) VALUES ('2006-03-01', 'c');

Create a table and a function to log the behaviour of the final query:

 CREATE TABLE log (accessed ENUM('t_archive', 't_current'));
 
 delimiter // 
 CREATE FUNCTION log_access(accessed ENUM('t_archive', 't_current'))
   RETURNS ENUM('t_archive', 't_current')
   DETERMINISTIC
 BEGIN
   INSERT INTO log VALUES (accessed);
   RETURN accessed;
 END //
 delimiter ;

Execute a UNION SELECT that can only return rows from one of the two tables:

 SELECT * FROM (
    SELECT    d, c, log_access('t_current') AS accessed
    FROM      t_current
    WHERE     d >= '2006-01-01'
  UNION
    SELECT    d, c, log_access('t_archive') AS accessed
    FROM      t_archive
    WHERE     d < '2006-01-01'
 ) AS s WHERE d > '2006-01-01';

 +------------+---+-----------+
 | d          | c | accessed  |
 +------------+---+-----------+
 | 2006-03-01 | c | t_current |
 +------------+---+-----------+
 1 row in set (0.03 sec)

The log shows that still both of the tables have been accessed:

 SELECT * FROM log;

 +-----------+
 | accessed  |
 +-----------+
 | t_current |
 | t_archive |
 +-----------+
 2 rows in set (0.00 sec)

Suggested fix:
The query optimiser should eliminate parts of a UNION statement that can't contribute any rows to the final result set due to an impossible combination of inner and outer WHERE restrictions.
[5 Mar 2006 9:14] Valeriy Kravchuk
Thank you for a reasonable feature request. It would be nice for optimizer to check such a situations, at least, in obvious cases. Or just to perform outer conditions merge into UNIONed SELECTs before further optimization steps.
[18 Aug 2007 2:07] Igor Babaev
- This is a request for an optimization implementation of which will take at least 2 man-month.
- Yet the optimization does not look widely applied.

By the above reasons I move the bug to 'To be fixed later'.
Product management will decide in what version a fix for this problem appears.