Bug #52636 | 6.0 allowing JOINs on NULL values w/ optimizer_join_cache_level = 5-8 | ||
---|---|---|---|
Submitted: | 6 Apr 2010 21:22 | Modified: | 23 Nov 2010 3:23 |
Reporter: | Patrick Crews | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 6.0-codebase-bugfixing | OS: | Any |
Assigned to: | Guilhem Bichot | CPU Architecture: | Any |
Tags: | optimizer_join_cache_level, optimizer_switch |
[6 Apr 2010 21:22]
Patrick Crews
[11 May 2010 9:24]
Guilhem Bichot
minimal testcase: CREATE TABLE t1 (b int); INSERT INTO t1 VALUES (NULL),(3); CREATE TABLE t2 (a int, b int, KEY (b)); INSERT INTO t2 VALUES (5,NULL),(7,3); SELECT t2.a as bar FROM t1 LEFT JOIN t2 ON t2.b = t1.b; With "SET SESSION optimizer_join_cache_level = X;" where X < 5, the SELECT returns two rows: 7 and NULL, which is correct (NULL=NULL is UNKNOWN so no t2 row matches t1's NULL and thus a NULL column for t2 is returned (as this is a left join). With X>=5: 7 and 5 are returned, which is wrong.
[12 May 2010 21:09]
Guilhem Bichot
optimizer_join_cache_level=5 and 8 have quite different code paths (JOIN_CACHE_BKA vs JOIN_CACHE_BKA_UNIQUE), make sure to test both in the pushed testcase.
[19 May 2010 9:53]
Guilhem Bichot
The original test case (which has a WHERE clause) would be solved when BUG#53793 is fixed; but the minimal one which has no WHERE clause would remain.
[21 May 2010 11:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/108869 3182 Guilhem Bichot 2010-05-21 Fix for BUG#52636 6.0 allowing JOINs on NULL values w/ optimizer_join_cache_level = 5-8 @ mysql-test/r/join_cache.result Before the fix, the SELECTs would show: both first and 2nd SELECT: (100),(NULL); 3rd SELECT: (100),(100); both 4th and 5th: (100),(200),(NULL). For example 1st SELECT would find no match in t2 for t1's 3 (thus emit t2.a=NULL, as it's a left join) and a match for t1's NULL (the bug) thus emit t2.a=100. @ mysql-test/t/join_cache.test test for bug @ sql/handler.cc After fixing sql_join_cache.cc, if the referenced table (t1 in the testcase) has only NULL rows, it can now happen, in handler::multi_range_read_next(), that the first call to mrr_funcs.next() (== bka_(unique)?range_seq_next) finds no keys (because it now internally eliminates NULLs); so "result" needs to be initialized for this case, otherwise behaviour is random (segmentation fault). @ sql/sql_join_cache.cc What happened in the test's scenario SELECT t2.a FROM t1 LEFT JOIN t2 ON t2.b = t1.b; is: - plan is to access t1, and then t2 with "ref" access on key t2.b to the reference t1.b. - make_cond_for_table_from_pred() recognizes that the ON equality can be discarded as it's a consequence of using "ref" access - this "ref" access is marked as null_rejecting. So far so good, and at execution with optimizer_join_cache_level<=4, there is no problem, "ref" access is done with join_read_always_key() which tests ref->null_rejecting and does not pass t1.b NULL values to t2's index lookup. At higher levels, JOIN_CACHE_BKA(_UNIQUE)? are used instead of join_read_always_key(), and they forgot to test ref->null_rejecting, so NULL values of t1.b where accidentally matched with NULL values of t2.b, leading to wrong results (see comment of join_cache.result). The fix is to make JOIN_CACHE_BKA(_UNIQUE)? observe null_rejecting: when they scan the cached rows of the outer table t1, collecting key values from those cached rows in order to send them to t2's multi-range-read, we now don't collect any NULL key value. For this, JOIN_CACHE_BKA gets the same test as join_read_always_key(). But JOIN_CACHE_BKA_UNIQUE is different, it doesn't have up-to-date fields pointed to by "ref", testable with ref->items[i].is_null(); it only has a raw key value (left by a previous duplicate-key-eliminating phase), so we inspect this key. @ sql/sql_select.cc comments. The last one is verbose, as failing to observe it has resulted in the present bug as well as BUG 46743. @ sql/sql_select.h a helper which can be used to temporarily reduce a testcase when debugging (allows to have one-row tables and still follow the same execution as two-row tables).
[25 May 2010 9:20]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/109137 3182 Guilhem Bichot 2010-05-25 Fix for BUG#52636 6.0 allowing JOINs on NULL values w/ optimizer_join_cache_level = 5-8 @ mysql-test/r/join_cache.result Before the fix, the SELECTs would show: both first and 2nd SELECT: (100),(NULL); 3rd SELECT: (100),(100); both 4th and 5th: (100),(200),(NULL). For example 1st SELECT would find no match in t2 for t1's 3 (thus emit t2.a=NULL, as it's a left join) and a match for t1's NULL (the bug) thus emit t2.a=100. @ mysql-test/t/join_cache.test test for bug @ sql/handler.cc After fixing sql_join_cache.cc, if the referenced table (t1 in the testcase) has only NULL rows, it can now happen, in handler::multi_range_read_next(), that the first call to mrr_funcs.next() (== bka_(unique)?range_seq_next) finds no keys (because it now internally eliminates NULLs); so "result" needs to be initialized for this case, otherwise behaviour is random (segmentation fault). @ sql/sql_join_cache.cc What happened in the test's scenario SELECT t2.a FROM t1 LEFT JOIN t2 ON t2.b = t1.b; is: - plan is to access t1, and then t2 with "ref" access on key t2.b to the reference t1.b. - make_cond_for_table_from_pred() recognizes that the ON equality can be discarded as it's a consequence of using "ref" access - this "ref" access is marked as null_rejecting. So far so good, and at execution with optimizer_join_cache_level<=4, there is no problem, "ref" access is done with join_read_always_key() which tests ref->null_rejecting and does not pass t1.b NULL values to t2's index lookup. At higher levels, JOIN_CACHE_BKA(_UNIQUE)? are used instead of join_read_always_key(), and they forgot to test ref->null_rejecting, so NULL values of t1.b where accidentally matched with NULL values of t2.b, leading to wrong results (see comment of join_cache.result). The fix is to make JOIN_CACHE_BKA(_UNIQUE)? observe null_rejecting: when they scan the cached rows of the outer table t1, collecting key values from those cached rows in order to send them to t2's multi-range-read, we now don't collect any NULL key value. For this, JOIN_CACHE_BKA gets the same test as join_read_always_key(). But JOIN_CACHE_BKA_UNIQUE is different, it doesn't have up-to-date fields pointed to by "ref", testable with ref->items[i].is_null(); it only has a raw key value (left by a previous duplicate-key-eliminating phase), so we inspect this key. @ sql/sql_select.cc comments. The last one is verbose, as failing to observe it has resulted in the present bug as well as BUG 46743. @ sql/sql_select.h a helper which can be used to temporarily reduce a testcase when debugging (allows to have one-row tables and still follow the same execution as two-row tables).
[31 May 2010 12:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/109617 3182 Guilhem Bichot 2010-05-31 Fix for BUG#52636 6.0 allowing JOINs on NULL values w/ optimizer_join_cache_level = 5-8 Optimizer chose "ref" access for the LEFT JOIN ON condition, and considered that this access it implemented the condition's semantics; which would have been true if JOIN_CACHE_BKA and JOIN_CACHE_BKA_UNIQUE hadn't forgotten to eliminate NULL values. @ mysql-test/r/join_cache.result Before the fix, the SELECTs would show: both first and 2nd SELECT: (100),(NULL); 3rd SELECT: (100),(100); both 4th and 5th: (100),(200),(NULL). For example 1st SELECT would find no match in t2 for t1's 3 (thus emit t2.a=NULL, as it's a left join) and a match for t1's NULL (the bug) thus emit t2.a=100. @ mysql-test/t/join_cache.test test for bug @ sql/handler.cc After fixing sql_join_cache.cc, if the referenced table (t1 in the testcase) has only NULL rows, it can now happen, in handler::multi_range_read_next(), that the first call to mrr_funcs.next() (== bka_range_seq_next or bka_unique_range_seq_next) finds no keys (because it now internally eliminates NULLs); so "result" needs to be initialized for this case, otherwise behaviour is random (segmentation fault). @ sql/sql_join_cache.cc What happened in the test's scenario SELECT t2.a FROM t1 LEFT JOIN t2 ON t2.b = t1.b; is: - plan is to access t1, and then t2 with "ref" access on key t2.b to the reference t1.b. - make_cond_for_table_from_pred() recognizes that the ON equality can be discarded as it's a consequence of using "ref" access - this "ref" access is marked as null_rejecting. So far so good, and at execution with optimizer_join_cache_level<=4, there is no problem, "ref" access is done with join_read_always_key() which tests ref->null_rejecting and does not pass t1.b NULL values to t2's index lookup. At higher levels, JOIN_CACHE_BKA or JOIN_CACHE_BKA_UNIQUE are used instead of join_read_always_key(), and they forgot to test ref->null_rejecting, so NULL values of t1.b where accidentally matched with NULL values of t2.b, leading to wrong results (see comment of join_cache.result). The fix is to make JOIN_CACHE_BKA and JOIN_CAHE_BKA_UNIQUE observe null_rejecting: when they scan the cached rows of the outer table t1, collecting key values from those cached rows in order to send them to t2's multi-range-read, we now don't collect any NULL key value. For this, JOIN_CACHE_BKA gets the same test as join_read_always_key(). But JOIN_CACHE_BKA_UNIQUE is different, it doesn't have up-to-date fields pointed to by "ref", testable with ref->items[i].is_null(); it only has a raw key value (left by a previous duplicate-key-eliminating phase), so we inspect this key. @ sql/sql_select.cc comments. The last one is verbose, as failing to observe it has resulted in the present bug as well as BUG 46743. @ sql/sql_select.h A member function to reduce duplicated code. A helper which can be used to temporarily reduce a testcase when debugging (allows to have one-row tables and still follow the same execution as two-row tables).
[1 Jun 2010 9:15]
Øystein Grøvlen
Approved, but please consider my comments.
[1 Jun 2010 9:31]
John Embretsen
I may have hit this when testing the mysql-next-mr-opt-backporting branch. See Bug#54110.
[1 Jun 2010 11:11]
Guilhem Bichot
BUG#54110 has been marked as duplicate of this one.
[1 Jun 2010 12:23]
Guilhem Bichot
Hello John. 6.0-codebase-bugfixing suffers from BUG#52636 too; it's not surprising that applying the fix makes next-mr-opt-backporting give different results from 6.0-codebase-bugfixing. You may want to use 5.1 as "the good reference" for checking new results.
[1 Jun 2010 14:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/109775 3182 Guilhem Bichot 2010-06-01 Fix for BUG#52636 6.0 allowing JOINs on NULL values w/ optimizer_join_cache_level = 5-8 Optimizer chose "ref" access for the LEFT JOIN ON condition, and considered that this access it implemented the condition's semantics; which would have been true if JOIN_CACHE_BKA and JOIN_CACHE_BKA_UNIQUE hadn't forgotten to eliminate NULL values. @ mysql-test/r/join_cache.result Before the fix, the SELECTs would show: both first and 2nd SELECT: (100),(NULL); 3rd SELECT: (100),(100); both 4th and 5th: (100),(200),(NULL). For example 1st SELECT would find no match in t2 for t1's 3 (thus emit t2.a=NULL, as it's a left join) and a match for t1's NULL (the bug) thus emit t2.a=100. @ mysql-test/t/join_cache.test test for bug @ sql/handler.cc After fixing sql_join_cache.cc, if the referenced table (t1 in the testcase) has only NULL rows, it can now happen, in handler::multi_range_read_next(), that the first call to mrr_funcs.next() (== bka_range_seq_next or bka_unique_range_seq_next) finds no keys (because it now internally eliminates NULLs); so "result" needs to be initialized for this case, otherwise behaviour is random (segmentation fault). @ sql/sql_join_cache.cc What happened in the test's scenario SELECT t2.a FROM t1 LEFT JOIN t2 ON t2.b = t1.b; is: - plan is to access t1, and then t2 with "ref" access on key t2.b to the reference t1.b. - make_cond_for_table_from_pred() recognizes that the ON equality can be discarded as it's a consequence of using "ref" access - this "ref" access is marked as null_rejecting. So far so good, and at execution with optimizer_join_cache_level<=4, there is no problem, "ref" access is done with join_read_always_key() which tests ref->null_rejecting and does not pass t1.b NULL values to t2's index lookup. At higher levels, JOIN_CACHE_BKA or JOIN_CACHE_BKA_UNIQUE are used instead of join_read_always_key(), and they forgot to test ref->null_rejecting, so NULL values of t1.b where accidentally matched with NULL values of t2.b, leading to wrong results (see comment of join_cache.result). The fix is to make JOIN_CACHE_BKA and JOIN_CAHE_BKA_UNIQUE observe null_rejecting: when they scan the cached rows of the outer table t1, collecting key values from those cached rows in order to send them to t2's multi-range-read, we now don't collect any NULL key value. For this, JOIN_CACHE_BKA gets the same test as join_read_always_key(). But JOIN_CACHE_BKA_UNIQUE is different, it doesn't have up-to-date fields pointed to by "ref", testable with ref->items[i].is_null(); it only has a raw key value (left by a previous duplicate-key-eliminating phase), so we inspect this key. @ sql/sql_select.cc comments. The last one is verbose, as failing to observe it has resulted in the present bug as well as BUG 46743. @ sql/sql_select.h A member function to reduce duplicated code. A helper which can be used to temporarily reduce a testcase when debugging (allows to have one-row tables and still follow the same execution as two-row tables).
[2 Jun 2010 13:10]
Guilhem Bichot
To John: the test for BUG#54110 uses InnoDB tables. It is normal that opt-backporting would suffer from this bug with InnoDB tables, more than 6.0-codebase-bugfixing: in the first tree, multi-range-read is enabled by default in InnoDB, which makes join buffering possible in the MySQL join execution code; and this bug is in the join buffering code. So in opt-backporting + InnoDB case, we probably go through the faulty code more often. Still, the only way to quickly know what tree is correct is comparing with 5.1 results.
[3 Jun 2010 12:47]
John Embretsen
Guilhem: I have a situation which is somewhat similar to this. I am not sure about the nitty gritty details of this bug yet, so can you tell if it is the same cause as this bug or Bug#53793, or something else? ------------------------ /*!50600 SET SESSION optimizer_join_cache_level = 8 */; CREATE TABLE `O` ( `pk` int NOT NULL AUTO_INCREMENT, PRIMARY KEY (`pk`), ); CREATE TABLE `K` ( `pk` int(11) NOT NULL AUTO_INCREMENT, `col_int_key` int(11) DEFAULT NULL, PRIMARY KEY (`pk`), KEY `col_int_key` (`col_int_key`) ); CREATE TABLE `L` ( `pk` int(11) NOT NULL AUTO_INCREMENT, `col_int_key` int(11) DEFAULT NULL, `col_datetime` datetime DEFAULT NULL, PRIMARY KEY (`pk`), KEY `test_idx` (`col_int_key`,`pk`) USING HASH ); INSERT INTO `L` VALUES (49,-49414144,'2006-06-04 00:00:00'); INSERT INTO `L` VALUES (50,NULL,NULL); SELECT table1 .`col_datetime` FROM L table1 LEFT JOIN K table2 LEFT JOIN O table3 ON table2 .`pk` ON table1 .`col_int_key` WHERE 6 AND table1 .`col_int_key` OR 7 AND table3 .`pk`; SELECT FOUND_ROWS(); DROP TABLE O; DROP TABLE K; DROP TABLE L; ------------------------ Diff between 5.1-bugteam and next-mr-opt-backporting (with patch for this bug report applied) is: WHERE 6 AND table1 .`col_int_key` OR 7 AND table3 .`pk`; col_datetime 2006-06-04 00:00:00 +NULL SELECT FOUND_ROWS(); FOUND_ROWS() -1 +2 DROP TABLE O; DROP TABLE K; DROP TABLE L; Since it needs the WHERE clause to cause a diff, I interpreted above comments as this could be fixed when Bug#53793 is fixed. However, when I read Bug#53793 I get the impression that it merely has a performance/efficiency impact, and not wrong results. So I would appreciate if you would help me become less confused :)
[3 Jun 2010 13:10]
John Embretsen
Typo in SQL in previous comment: Please remove comma in PRIMARY KEY (`pk`), for CREATE TABLE `O` to avoid errors.
[3 Jun 2010 20:27]
Guilhem Bichot
Hello John. Good catch! I could repeat the problem in your last post (but only if I use InnoDB tables). It is independent of the patch for the present bug#52636 (not introduced by it, not fixed by it). I get the problem only with join_cache_level=4,6,8 (that's another indication that it's not bug#52636, which happens with 5,6,7,8). I suggest opening a new bug report.
[4 Jun 2010 12:54]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/110238 3185 Guilhem Bichot 2010-06-01 Fix for BUG#52636 6.0 allowing JOINs on NULL values w/ optimizer_join_cache_level = 5-8 Optimizer chose "ref" access for the LEFT JOIN ON condition, and considered that this access it implemented the condition's semantics; which would have been true if JOIN_CACHE_BKA and JOIN_CACHE_BKA_UNIQUE hadn't forgotten to eliminate NULL values. @ mysql-test/r/join_cache.result Before the fix, the SELECTs would show: both first and 2nd SELECT: (100),(NULL); 3rd SELECT: (100),(100); both 4th and 5th: (100),(200),(NULL). For example 1st SELECT would find no match in t2 for t1's 3 (thus emit t2.a=NULL, as it's a left join) and a match for t1's NULL (the bug) thus emit t2.a=100. @ mysql-test/t/join_cache.test test for bug @ sql/handler.cc After fixing sql_join_cache.cc, if the referenced table (t1 in the testcase) has only NULL rows, it can now happen, in handler::multi_range_read_next(), that the first call to mrr_funcs.next() (== bka_range_seq_next or bka_unique_range_seq_next) finds no keys (because it now internally eliminates NULLs); so "result" needs to be initialized for this case, otherwise behaviour is random (segmentation fault). @ sql/sql_join_cache.cc What happened in the test's scenario SELECT t2.a FROM t1 LEFT JOIN t2 ON t2.b = t1.b; is: - plan is to access t1, and then t2 with "ref" access on key t2.b to the reference t1.b. - make_cond_for_table_from_pred() recognizes that the ON equality can be discarded as it's a consequence of using "ref" access - this "ref" access is marked as null_rejecting. So far so good, and at execution with optimizer_join_cache_level<=4, there is no problem, "ref" access is done with join_read_always_key() which tests ref->null_rejecting and does not pass t1.b NULL values to t2's index lookup. At higher levels, JOIN_CACHE_BKA or JOIN_CACHE_BKA_UNIQUE are used instead of join_read_always_key(), and they forgot to test ref->null_rejecting, so NULL values of t1.b where accidentally matched with NULL values of t2.b, leading to wrong results (see comment of join_cache.result). The fix is to make JOIN_CACHE_BKA and JOIN_CAHE_BKA_UNIQUE observe null_rejecting: when they scan the cached rows of the outer table t1, collecting key values from those cached rows in order to send them to t2's multi-range-read, we now don't collect any NULL key value. For this, JOIN_CACHE_BKA gets the same test as join_read_always_key(). But JOIN_CACHE_BKA_UNIQUE is different, it doesn't have up-to-date fields pointed to by "ref", testable with ref->items[i].is_null(); it only has a raw key value (left by a previous duplicate-key-eliminating phase), so we inspect this key. @ sql/sql_select.cc comments. The last one is verbose, as failing to observe it has resulted in the present bug as well as BUG 46743. @ sql/sql_select.h A member function to reduce duplicated code. A helper which can be used to temporarily reduce a testcase when debugging (allows to have one-row tables and still follow the same execution as two-row tables).
[4 Jun 2010 13:33]
Guilhem Bichot
queued to next-mr-opt-backporting
[4 Jun 2010 14:43]
Guilhem Bichot
The problem reported by John a few posts above is now filed as BUG#54235
[16 Aug 2010 6:37]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:07]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 3:23]
Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.