Bug #46658 | Differing query results in azalea depending on join_cache_level settings, regres | ||
---|---|---|---|
Submitted: | 11 Aug 2009 16:19 | Modified: | 5 Oct 2009 13:22 |
Reporter: | Patrick Crews | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.4/6.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | extra rows, join_cache_level, optimizer_switch, regression |
[11 Aug 2009 16:19]
Patrick Crews
[11 Aug 2009 16:20]
Patrick Crews
MTR test case for this bug.
Attachment: bug46658_test.txt (text/plain), 17.29 KiB.
[5 Oct 2009 9:28]
Roy Lyseng
-- Beautified query 0: SELECT MAX(table2.`int_key` ) AS field1, table1.`varchar_key` AS field2, table2.`int_nokey` + table1 .`pk` AS field3, COUNT(table1.`int_nokey`) AS field4, table1.`int_key` + table1.`int_key` AS field5, table2.`int_nokey` * table2.`int_key` AS field6, table1.`date_key` AS field7 FROM D AS table1 STRAIGHT_JOIN BB AS table2 ON table2.`varchar_key` = table1 .`varchar_nokey` WHERE ( 'v' , 'c' ) NOT IN (SELECT DISTINCT SUBQUERY1_t2.`varchar_nokey`, SUBQUERY1_t1.`varchar_key` FROM B AS SUBQUERY1_t1 LEFT JOIN BB AS SUBQUERY1_t2 INNER JOIN BB AS SUBQUERY1_t3 ON SUBQUERY1_t3.`int_nokey` = SUBQUERY1_t2.`int_key` ON SUBQUERY1_t3.`int_key` = SUBQUERY1_t2 .`int_key`) AND EXISTS (SELECT SUBQUERY2_t1 .`pk` AS sq_field1 FROM C AS SUBQUERY2_t1 INNER JOIN CC AS SUBQUERY2_t2 ON SUBQUERY2_t2.`int_key` = SUBQUERY2_t1.`int_key` WHERE SUBQUERY2_t2.`int_key` <= table2 .`pk`) HAVING field7 > 9 ORDER BY table1.`int_key`; -- Step by step analysis of the query execution: -- Step 1: The outer SELECT block containing the straight join: SELECT table2.`int_key` AS field1, table1.`varchar_key` AS field2, table2.`int_nokey` + table1 .`pk` AS field3, table1 .`int_nokey` AS field4, table1.`int_key` + table1.`int_key` AS field5, table2.`int_nokey` * table2.`int_key` AS field6, table1.`date_key` AS field7, table2 .`pk` FROM D AS table1 STRAIGHT_JOIN BB AS table2 ON table2.`varchar_key` = table1 .`varchar_nokey`; +--------+--------+--------+--------+--------+--------+------------+----+ | field1 | field2 | field3 | field4 | field5 | field6 | field7 | pk | +--------+--------+--------+--------+--------+--------+------------+----+ | 9 | r | 24 | 4 | 6 | 9 | 2004-06-08 | 10 | | 9 | r | 60 | 3 | NULL | 9 | 2003-05-22 | 10 | | 9 | r | 95 | 4 | 16 | 9 | 2005-09-09 | 10 | | 9 | a | 21 | 1 | 4 | 45 | NULL | 11 | | 9 | a | 69 | 8 | 2 | 45 | 2005-12-16 | 11 | | 9 | a | 98 | 6 | 16 | 45 | 2003-10-02 | 11 | +--------+--------+--------+--------+--------+--------+------------+----+ -- Step 2: The NOT IN subquery: SELECT DISTINCT SUBQUERY1_t2.`varchar_nokey`, SUBQUERY1_t1.`varchar_key` FROM B AS SUBQUERY1_t1 LEFT JOIN BB AS SUBQUERY1_t2 INNER JOIN BB AS SUBQUERY1_t3 ON SUBQUERY1_t3.`int_nokey` = SUBQUERY1_t2.`int_key` ON SUBQUERY1_t3.`int_key` = SUBQUERY1_t2 .`int_key`; +---------------+-------------+ | varchar_nokey | varchar_key | +---------------+-------------+ | NULL | f | | NULL | w | +---------------+-------------+ ==> TRUE for all rows in outer query -- Step 3: The EXISTS subquery SELECT SUBQUERY2_t2.`int_key` FROM C AS SUBQUERY2_t1 INNER JOIN CC AS SUBQUERY2_t2 ON SUBQUERY2_t2.`int_key` = SUBQUERY2_t1.`int_key`; +---------+ | 0 | | 0 | | 1 | | 2 | | 2 | | 3 | | 3 | | 3 | | 3 | | 3 | | 3 | | 5 | | 5 | | 5 | | 5 | | 6 | | 6 | | 8 | | 8 | | 8 | | 8 | +---------+ -- Step 4: Correlate outer query against EXISTS subquery: WHERE SUBQUERY2_t2.`int_key` <= table2.`pk`; For each table2.pk, there is always an `int_key` from the EXISTS subquery that is less or equal, so the result of this predicate is always TRUE. -- Step 5: Perform the aggregation: SELECT MAX(table2.`int_key` ) AS field1, table1.`varchar_key` AS field2, table2.`int_nokey` + table1 .`pk` AS field3, COUNT(table1.`int_nokey`) AS field4, table1.`int_key` + table1.`int_key` AS field5, table2.`int_nokey` * table2.`int_key` AS field6, table1.`date_key` AS field7 FROM D AS table1 STRAIGHT_JOIN BB AS table2 ON table2.`varchar_key` = table1 .`varchar_nokey`; +--------+--------+--------+--------+--------+--------+------------+ | field1 | field2 | field3 | field4 | field5 | field6 | field7 | +--------+--------+--------+--------+--------+--------+------------+ | 9 | r | 24 | 6 | 6 | 9 | 2004-06-08 | +--------+--------+--------+--------+--------+--------+------------+ -- Step 6: Filter the group from step 5 with HAVING SELECT MAX(table2.`int_key` ) AS field1, table1.`varchar_key` AS field2, table2.`int_nokey` + table1 .`pk` AS field3, COUNT(table1.`int_nokey`) AS field4, table1.`int_key` + table1.`int_key` AS field5, table2.`int_nokey` * table2.`int_key` AS field6, table1.`date_key` AS field7 FROM D AS table1 STRAIGHT_JOIN BB AS table2 ON table2.`varchar_key` = table1 .`varchar_nokey` HAVING field7 > 9; Result with join_cache_level = 0: Empty set, 1 warning (0.01 sec) Result with join_cache_level = 1: +--------+--------+--------+--------+--------+--------+------------+ | field1 | field2 | field3 | field4 | field5 | field6 | field7 | +--------+--------+--------+--------+--------+--------+------------+ | 9 | r | 24 | 6 | 6 | 9 | 2004-06-08 | +--------+--------+--------+--------+--------+--------+------------+ Now, field7 is a non-deterministic value, so we cannot determine which of these results is the "correct" one. From Step 1, we see that field7 contains one NULL value and 5 non-NULL values. It may easily be that MySQL selects one value for join_cache_level=0 and another for join_cache_level=1, meaning that it is impossible to determine a deterministic HAVING result. Even with STRAIGHT_JOIN that fixes the join ordering, the optimizer is still free to use different indexes, meaning that the selection of field7 in the HAVING clause is non-deterministic.
[5 Oct 2009 13:22]
Roy Lyseng
Closing as Not a bug after agreement with Patrick.