Bug #31188 | Optimizer does not do index range scans using predicates in IN lists | ||
---|---|---|---|
Submitted: | 25 Sep 2007 16:52 | Modified: | 19 Nov 2013 13:53 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.0.37, 5.5 | OS: | Any |
Assigned to: | Martin Hansson | CPU Architecture: | Any |
Tags: | bfsm_2007_10_18, bfsm_2007_11_01, bfsm_2007_12_06, INDEX, inlist, Optimizer, range |
[25 Sep 2007 16:52]
Mark Callaghan
[25 Sep 2007 17:15]
MySQL Verification Team
Thank you for the bug report. Verified as described.
[25 Sep 2007 17:56]
Harrison Fisk
This appears to be a known issue. In the documentation at: http://dev.mysql.com/doc/refman/5.0/en/subquery-restrictions.html The following appears: == # Row constructors are not well optimized. The following two expressions are equivalent, but only the second can be optimized: (col1, col2, ...) = (val1, val2, ...) col1 = val1 AND col2 = val2 AND ... == This looks to be one of those cases where it can't be optimized well.
[15 Nov 2007 12:16]
Sergey Petrunya
Analysis ======== The range optimizer is not able to handle row comparisons in form (a1,a2, ...) = (b1, b2, ...) Single comparison handling is done by converting (a1, a2, ...) = (b1, b2, ...) to a1=b1 AND a2=b2 AND ... The conversion has been introduced as fix for BUG#16081. It is performed by check_row_equality() function which invoked at the equality propagation phase. The obtained equalities are also taken into account by equality propagation. In general case, "a IN (b,c,d, ...)" cannot used by construct equality sets. Consequently, it is ignored by equality propagation and is not converted to AND/OR. Range optimizer ignores tuple comparisons, therefore predicates like (a1, a2, ...) IN ((b11,b12,...), (c11,c12, ...), ...) are non-sargable in MySQL.
[15 Nov 2007 12:19]
Sergey Petrunya
The quote from the manual posted by Harrison Fisk is not true after BUG#16081. I'll notify the docs team.
[15 Nov 2007 12:49]
Sergey Petrunya
More analysis ============= At the moment range optimizer processes only IN predicates that have form t.keypart IN (const1, ... constN) and that case is a special case for IN expression - it computes a sorted list of {const_i} which is used by range analyzer to quickly construct the corresponding SEL_TREE. Predicates in other forms, e.g. "2 IN (3,a)" are not processed. Re the fix ========== We consider this bug to be a feature request. With tuple-based comparisons it is not possible to pre-sort the IN list as we have no idea how it should be sorted. It seems the fix is to make the range optimizer process (a1, a2, ...) IN ((b11,b12,...), (c11,c12, ...), ...) in the same way as it would have processed an equvalent AND/OR formula. Possible hazards ~~~~~~~~~~~~~~~~ 1. Type inference. There were some counter-intuitive rules re what should MySQL do when the IN list contains constants of different types. We should take care not to make any accidental modifications to semantics of tuple-based IN comparison. 2. Combinatorial blowup. the IN-list might be big. We need to check if we might use combinatorial amounts of time/space when processing it. Need to ask Mark Callaghan if he actually needs to process very big IN-lists. 3. If we implement conversion of tuple-based IN comparsion to AND/OR formula, won't then somebody come and demand that we handle scalar-value based predicates with non-constants in the right part (e.g. "2 IN (3,a)") too? We did have such requests for BETWEEN. (maybe disable processing of the tuple-based INs if they have non-constants on the right side?) Where/if we should fix it ~~~~~~~~~~~~~~~~~~~~~~~~~ I consider it a feature request. Do we have the feature trees already? If yes we might fix this bug there.
[15 Nov 2007 17:47]
Mark Callaghan
I agree that this is a feature request rather than a bug report. I have many queries with large in-lists, but most of them are single column inlists: col in (<thousands of values>) Predicates with multi-column inlists tend to have few values.
[14 Oct 2008 18:24]
Valeriy Kravchuk
Bug #40029 was marked as a duplicate of this one.
[21 Oct 2009 5:09]
Timothy Smith
Bug#35819 was marked as a duplicate of this one.
[7 Jan 2010 18:40]
Mark Callaghan
The docs should be updated. They don't explain the feature request for which this bug is open: >>> Prior to MySQL 5.0.26, row constructors were not well optimized; of the following two equivalent expressions, only the second could be optimized: (col1, col2, ...) = (val1, val2, ...) col1 = val1 AND col2 = val2 AND ... In MySQL 5.0.26 and later, all row equalities are converted into conjunctions of equalities between row elements, and handled by the optimizer in the same way. (Bug#16081) >>>
[7 Jan 2010 21:35]
Mark Callaghan
This is the same as http://bugs.mysql.com/bug.php?id=16247, but the author of this feature request is much more convincing.
[7 Jan 2010 21:36]
Domas Mituzas
hehe.
[24 Jan 2010 11:49]
Valeriy Kravchuk
Bug #50571 was marked as a duplicate of this one.
[25 Jan 2010 14:19]
MySQL Verification Team
In my humble opinion, this is still not a bug, but feature request. This feature request, however, should be very high on TODO list, as it would make a feature that is sorely missing, particularly since tuple equality expressions have been already optimized.
[16 Jun 2011 16:31]
Roberto Spadim
i put some test files at bug 61541 if anyone want to test with it...
[16 Jun 2011 16:34]
Roberto Spadim
just an idea... maybe some work done at bug 16081, could help here too (same part of changed code) http://lists.mysql.com/commits/11247
[18 Jun 2011 13:36]
Valeriy Kravchuk
Bug #61541 was marked as a duplicate of this one.
[11 Nov 2011 15:19]
Valeriy Kravchuk
Bug #63167 was marked as a duplicate of this one.
[20 Mar 2012 16:22]
MySQL Verification Team
http://bugs.mysql.com/bug.php?id=64706 marked as duplicate of this one.
[19 Nov 2013 13:53]
Erlend Dahl
Implemented in 5.7.3.
[18 Mar 2016 16:44]
Paul DuBois
Noted in 5.7.3 changelog. The optimizer now is able to apply the range scan access method to queries of this form: SELECT ... FROM t1 WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' )); Previously, for range scans to be used it was necessary for the query to be written as: SELECT ... FROM t1 WHERE ( col_1 = 'a' AND col_2 = 'b' ) OR ( col_1 = 'c' AND col_2 = 'd' ); For the optimizer to use a range scan, queries must satisfy these conditions: * Only IN predicates are used, not NOT IN. * On the left side of the IN predicate, the row constructor contains only column references. * On the right side of the IN predicate, row constructors contain only runtime constants, which are either literals or local column references that are bound to constants during execution. * On the right side of the IN predicate, there is more than one row constructor. EXPLAIN output for applicable queries changes from full table scan or index scan to range scan. Changes are also visible by checking the values of the Handler_read_first, Handler_read_key, and Handler_read_next status variables.
[27 Jul 2016 9:35]
Olav Sandstå
This feature request was implemented as WL#7019 "Add support for row value constructors in in predicates to range optimizer".