MySQL Bugs: #54608: Query using IN + OR + IS TRUE and IS NULL returns NULL when should be empty

Bug #54608	Query using IN + OR + IS TRUE and IS NULL returns NULL when should be empty
Submitted:	18 Jun 2010 9:28	Modified:	23 Nov 2010 4:02
Reporter:	John Embretsen	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	mysql-next-mr-opt-backporting, 6.0.14	OS:	Any
Assigned to:	Guilhem Bichot	CPU Architecture:	Any
Tags:	regression

Description:
Consider the following table, where "a" is an integer column with an index:

+------+
| a    |
+------+
| NULL |
|    1 |
|    2 |
+------+

and the query:

SELECT a FROM t1 WHERE a IN (42) OR (a IS TRUE AND a IS NULL);

Here, "IN (42)" is FALSE, and "(IS TRUE AND IS NULL)" is also FALSE (assuming that a NULL value is never evaluated as TRUE). The query should not match any rows.

With mysql-next-mr-opt-backporting we get:

+------+
| a    |
+------+
| NULL |
+------+
1 row in set (0.00 sec)

whereas with mysql-5.1 (as well as trunk and next-mr) we get:

Empty set (0.00 sec)

There is a difference in EXPLAIN output:

mysql-next-mr-opt-backporting:
mysql> EXPLAIN SELECT a FROM t1 WHERE a IN (42) OR (a IS TRUE AND a IS NULL);
+----+-------------+-------+-------------+---------------+-------+---------+-------+------+-------------+
| id | select_type | table | type        | possible_keys | key   | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------------+---------------+-------+---------+-------+------+-------------+
|  1 | SIMPLE      | t1    | ref_or_null | key_a         | key_a | 5       | const |    2 | Using index |
+----+-------------+-------+-------------+---------------+-------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql-5.1:
mysql> EXPLAIN SELECT a FROM t1 WHERE a IN (42) OR (a IS TRUE AND a IS NULL);
+----+-------------+-------+-------------+---------------+-------+---------+-------+------+--------------------------+
| id | select_type | table | type        | possible_keys | key   | key_len | ref   | rows | Extra                    |
+----+-------------+-------+-------------+---------------+-------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | t1    | ref_or_null | key_a         | key_a | 5       | const |    2 | Using where; Using index |
+----+-------------+-------+-------------+---------------+-------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)

Issue is not repeatable if there is no key (index) on the column.
Issue is not dependent on optimizer_join_cache_level or optimizer_switch values.

Observed against revid tor.didriksen@sun.com-20100617093107-k7f2nbj8xcdgm6qj
of internal bzr tree mysql-next-mr-opt-backporting.

How to repeat:
DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (
  a INT,
  KEY key_a (a)
);

INSERT INTO t1 VALUES (1), (2), (NULL);

-- this is OK (Empty):
SELECT a FROM t1 WHERE a IS TRUE AND a IS NULL;

-- this is also OK (Empty):
SELECT a FROM t1 WHERE a IN (42);

-- this is strange (NULL):
SELECT a FROM t1 WHERE a IN (42) OR (a IS TRUE AND a IS NULL);

Verified just as described on Mac OS X:

valeriy-kravchuks-macbook-pro:6.0-codebase openxs$ bin/mysql -uroot testReading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 6.0.14-alpha-debug Source distribution

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> DROP TABLE IF EXISTS t1;
Query OK, 0 rows affected (0.05 sec)

mysql> 
mysql> CREATE TABLE t1 (
    ->   a INT,
    ->   KEY key_a (a)
    -> );
Query OK, 0 rows affected (0.11 sec)

mysql> 
mysql> INSERT INTO t1 VALUES (1), (2), (NULL);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> 
mysql> -- this is OK (Empty):
mysql> SELECT a FROM t1 WHERE a IS TRUE AND a IS NULL;
Empty set (0.02 sec)

mysql> 
mysql> -- this is also OK (Empty):
mysql> SELECT a FROM t1 WHERE a IN (42);
Empty set (0.01 sec)

mysql> 
mysql> -- this is strange (NULL):
mysql> SELECT a FROM t1 WHERE a IN (42) OR (a IS TRUE AND a IS NULL);
+------+
| a    |
+------+
| NULL |
+------+
1 row in set (0.00 sec)

mysql> explain SELECT a FROM t1 WHERE a IN (42) OR (a IS TRUE AND a IS NULL);
+----+-------------+-------+-------------+---------------+-------+---------+-------+------+-------------+
| id | select_type | table | type        | possible_keys | key   | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------------+---------------+-------+---------+-------+------+-------------+
|  1 | SIMPLE      | t1    | ref_or_null | key_a         | key_a | 5       | const |    2 | Using index |
+----+-------------+-------+-------------+---------------+-------+---------+-------+------+-------------+
1 row in set (0.00 sec)

Verified in 6.0-cb as far back as 2008-03

Initial analysis: 

It seems that the conditions are wrongly removed in make_cond_for_table_from_pred() in the if() following comment "Remove equalities that are guaranteed to be true by use of 'ref' access method."

In next-mr-opt-backporting, test_if_ref(root_cond, (Item_field*) left_item,right_item) returns true, while it returns false in next-mr. The reason for this difference is this part of the patch with revision 

sp1r-sergefp@mysql.com-20070309210824-13110
WL#2474 "Multi Range Read: Change the default MRR implementation to implement new MRR interface"
WL#2475 "Batched range read functions for MyISAM/InnoDb"
        "Index condition pushdown for MyISAM/InnoDB"
combined into one cset

------- Reverting offending part of patch --------
=== modified file 'sql/sql_select.cc'
--- sql/sql_select.cc	2010-06-25 12:30:30 +0000
+++ sql/sql_select.cc	2010-06-29 11:37:11 +0000
@@ -18690,7 +18690,7 @@ part_of_refkey(TABLE *table,Field *field
 
     for (part=0 ; part < ref_parts ; part++,key_part++)
       if (field->eq(key_part->field) &&
-	  !(key_part->key_part_flag & HA_PART_KEY_SEG))
+	  !(key_part->key_part_flag & (HA_PART_KEY_SEG | HA_NULL_PART)))
 	return table->reginfo.join_tab->ref.items[part];
   }
   return (Item*) 0;

=== modified file 'sql/table.cc'
--- sql/table.cc	2010-06-25 10:53:36 +0000
+++ sql/table.cc	2010-06-29 11:15:16 +0000
@@ -1539,6 +1539,8 @@ static int open_binary_frm(THD *thd, TAB
 #endif
           key_part->key_part_flag|= HA_PART_KEY_SEG;
         }
+        if (field->real_maybe_null())
+          key_part->key_part_flag|= HA_NULL_PART;
       }
       keyinfo->usable_key_parts= usable_parts; // Filesort
---------------------------------------

Applying the diff above fixes the wrong result reported in this bug, but it also changes the query plan for a number of cases in our test suites. The code above is not a suggested fix, just initial findings.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/112831

3207 Guilhem Bichot	2010-07-04
      Fix for BUG#54608 "Query using IN + OR + IS TRUE and IS NULL returns NULL when should be empty"
      selection condition was wrongly assumed to be satisfied by ref_or_null
     @ mysql-test/include/null_key.inc
        test for bug; without the code fix, the two SELECTs would return NULL instead of
        empty set.
     @ sql/sql_select.cc
        In the test's scenario, the WHERE condition is
          a IN (42) OR (a IS TRUE AND a IS NULL) .
        If this condition is true then so is
          a IN (42) OR a IS NULL ,
        thus ref_or_null access is correctly chosen to access t1.
        But then it is wrongly assumed that this ref_or_null access makes
        checking the entire condition unneeded. Here's how:
        - make_join_select() wants to attach conditions to tables (i.e. push as
        much as possible of the WHERE clause as deep as possible in the operator
        tree)
        - for that it calls
        make_cond_for_table_from_pred("a IN (42) OR (a IS TRUE AND a IS NULL)")
        ("make a condition to be attached to the table, from this
        predicate"). That function sees that the condition is "cond1 OR cond2";
        it calls make_cond_for_table_from_pred() on cond1 and on cond2, to later
        recombine the resulting conditions with an OR. Here cond1 is
        "a IN (42)", cond2 is "a IS TRUE AND a IS NULL".
        - make_cond_for_table_from_pred(cond1) calls test_if_ref()
        - test_if_ref() sees that in the ref_or_null access, "a" is the field
        used for index lookup, 42 is the lookup value, and so any found record
        automatically satisfies "a IN (42)"
        - thus make_cond_for_table_from_pred(cond1) believes that cond1 needn't
        be checked (is guaranteed to be true by the use of ref_or_null)
        - thus the top-level make_cond_for_table_from_pred() believes that
        "cond1 OR cond2" == "TRUE or cond2" thus needn't be checked
        - what test_if_ref() forgot is that 42 is not the _only_ lookup value:
        at execution, ref_or_null will look up "42" _and_ also look up
        NULL. ref_or_null does not implement "a IN (42)" but
        "a IN (42) OR a IS NULL". So a record returned by index lookup could
        possibly have a being NULL and not satisfy "a IN (42)". Thus checking
        "a IN (42)" after ref_or_null is still needed!
        The fix: test_if_ref() should not say that x=y can be replaced by
        ref_or_null access.
        
        Note: this does cause plan changes, where many "Using where" now appear
        in EXPLAIN for "ref_or_null" access.  This actually makes plans closer
        to what they are in next-mr. next-mr-opt-backporting had lost the "Using
        where" for many tests (and the bug was introduced) in that commit:
        sp1r-sergefp@mysql.com-20070309210824-13110
        The intention of that commit looks correct: there was some code which
        made part_of_refkey() pretend that the ref/ref_or_null access referred
        to no item if the index was on a nullable column; that commit removed
        this code. Apparently it accidentally exposed a bug in the logic of
        test_if_ref().
        Undoing this Sergey's commit, though it would restore all next-mr plans,
        would have stronger effects than the present patch: for example it would
        add "Using where" for a ref access using an index on a nullable column:
        
        innodb.innodb_mysql                      w8 [ fail ]
        # query sourced from mix1.inc:
        EXPLAIN
        SELECT COUNT(*) FROM t2 LEFT JOIN t1 ON t2.fkey = t1.id
          WHERE t1.name LIKE 'A%';
         id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
         1	SIMPLE	t1	index	PRIMARY,name	name	23	NULL	3	Using where; Using index
        -1	SIMPLE	t2	ref	fkey	fkey	5	test.t1.id	1	Using index
        +1	SIMPLE	t2	ref	fkey	fkey	5	test.t1.id	1	Using where; Using index
        and that would be a superfluous "using where". Whereas the current patch
        affects only ref_or_null.
        
        Note2: the current patch restores corrects results, but there would
        be room for future improvements: when the condition is as simple as
          key_col=constant OR key_col IS NULL,
        and access method is "ref_or_null", the Optimizer, when attaching this
        condition to the table, asks itself two questions (one call to
        make_cond_for_table_from_pred() for each):
        - does the access method guarantee key_col=constant?
        - does the access method guarantee key_col IS NULL?
        The correct answer is "no" for both, thus the plan has "using
        where". But the access method does guarantee the OR-combined condition,
        and so in reality "using where" is superfluous and could be dropped. The
        Optimizer does not see it because it treats the above two questions
        independently.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/113128

3210 Guilhem Bichot	2010-07-04
      Fix for BUG#54608 "Query using IN + OR + IS TRUE and IS NULL returns NULL when should be empty"
      selection condition was wrongly assumed to be satisfied by ref_or_null
     @ mysql-test/include/null_key.inc
        test for bug; without the code fix, the two SELECTs would return NULL instead of
        empty set.
     @ sql/sql_select.cc
        In the test's scenario, the WHERE condition is
          a IN (42) OR (a IS TRUE AND a IS NULL) .
        If this condition is true then so is
          a IN (42) OR a IS NULL ,
        thus ref_or_null access is correctly chosen to access t1.
        But then it is wrongly assumed that this ref_or_null access makes
        checking the entire condition unneeded. Here's how:
        - make_join_select() wants to attach conditions to tables (i.e. push as
        much as possible of the WHERE clause as deep as possible in the operator
        tree)
        - for that it calls
        make_cond_for_table_from_pred("a IN (42) OR (a IS TRUE AND a IS NULL)")
        ("make a condition to be attached to the table, from this
        predicate"). That function sees that the condition is "cond1 OR cond2";
        it calls make_cond_for_table_from_pred() on cond1 and on cond2, to later
        recombine the resulting conditions with an OR. Here cond1 is
        "a IN (42)", cond2 is "a IS TRUE AND a IS NULL".
        - make_cond_for_table_from_pred(cond1) calls test_if_ref()
        - test_if_ref() sees that in the ref_or_null access, "a" is the field
        used for index lookup, 42 is the lookup value, and so any found record
        automatically satisfies "a IN (42)"
        - thus make_cond_for_table_from_pred(cond1) believes that cond1 needn't
        be checked (is guaranteed to be true by the use of ref_or_null)
        - thus the top-level make_cond_for_table_from_pred() believes that
        "cond1 OR cond2" == "TRUE or cond2" thus needn't be checked
        - what test_if_ref() forgot is that 42 is not the _only_ lookup value:
        at execution, ref_or_null will look up "42" _and_ also look up
        NULL. ref_or_null does not implement "a IN (42)" but
        "a IN (42) OR a IS NULL". So a record returned by index lookup could
        possibly have a being NULL and not satisfy "a IN (42)". Thus checking
        "a IN (42)" after ref_or_null is still needed!
        The fix: test_if_ref() should not say that x=y can be replaced by
        ref_or_null access.
        
        Note: this does cause plan changes, where many "Using where" now appear
        in EXPLAIN for "ref_or_null" access.  This actually makes plans closer
        to what they are in next-mr. next-mr-opt-backporting had lost the "Using
        where" for many tests (and the bug was introduced) in that commit:
        sp1r-sergefp@mysql.com-20070309210824-13110
        The intention of that commit looks correct: there was some code which
        made part_of_refkey() pretend that the ref/ref_or_null access referred
        to no item if the index was on a nullable column; that commit removed
        this code. Apparently it accidentally exposed a bug in the logic of
        test_if_ref().
        Undoing this Sergey's commit, though it would restore all next-mr plans,
        would have stronger effects than the present patch: for example it would
        add "Using where" for a ref access using an index on a nullable column:
        
        innodb.innodb_mysql                      w8 [ fail ]
        # query sourced from mix1.inc:
        EXPLAIN
        SELECT COUNT(*) FROM t2 LEFT JOIN t1 ON t2.fkey = t1.id
          WHERE t1.name LIKE 'A%';
         id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
         1	SIMPLE	t1	index	PRIMARY,name	name	23	NULL	3	Using where; Using index
        -1	SIMPLE	t2	ref	fkey	fkey	5	test.t1.id	1	Using index
        +1	SIMPLE	t2	ref	fkey	fkey	5	test.t1.id	1	Using where; Using index
        and that would be a superfluous "using where". Whereas the current patch
        affects only ref_or_null.
        
        Note2: the current patch restores corrects results, but there would
        be room for future improvements: when the condition is as simple as
          key_col=constant OR key_col IS NULL,
        and access method is "ref_or_null", the Optimizer, when attaching this
        condition to the table, asks itself two questions (one call to
        make_cond_for_table_from_pred() for each):
        - does the access method guarantee key_col=constant?
        - does the access method guarantee key_col IS NULL?
        The correct answer is "no" for both, thus the plan has "using
        where". But the access method does guarantee the OR-combined condition,
        and so in reality "using where" is superfluous and could be dropped. The
        Optimizer does not see it because it treats the above two questions
        independently.

queued to next-mr-opt-team and next-mr-opt-backporting

Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)

Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)

Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.