Bug #44139 Table scan when NULL appears in IN clause
Submitted: 7 Apr 2009 23:00 Modified: 18 Dec 2009 13:20
Reporter: Gary Pendergast Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: Gleb Shchepa
Tags: Optimizer, regression
Triage: Triaged: D2 (Serious) / R2 (Low) / E2 (Low)

[7 Apr 2009 23:00] Gary Pendergast
Description:
Optimizer chooses a table scan when there is a NULL in the IN clause. 

This is a regression. Compare EXPLAIN output from 5.1.79:

mysql [localhost] {msandbox} (test) > explain select * from foo where a in (160000, 160001, 160002)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: foo
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 3
        Extra: Using where
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > explain select * from foo where a in (null, 160000, 160001, 160002)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: foo
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 3
        Extra: Using where
1 row in set (0.01 sec)

To 5.1.32:

mysql [localhost] {msandbox} (test) > explain select * from foo where a in (160000, 160001, 160002)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: foo
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 3
        Extra: Using where
1 row in set (0.06 sec)

mysql [localhost] {msandbox} (test) > explain select * from foo where a in (null, 160000, 160001, 160002)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: foo
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 327680
        Extra: Using where
1 row in set (0.00 sec)

How to repeat:
use test;
drop table if exists foo;

create table foo (
	a int not null auto_increment,
	b int,
	primary key( a )
);

insert into foo( b ) values (1),(1),(1),(1),(1);
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;

explain select * from foo where a in (160000, 160001, 160002)\G
explain select * from foo where a in (null, 160000, 160001, 160002)\G

Suggested fix:
Use index when available.
[8 Apr 2009 8:28] Stewart Smith
also noted for Drizzle (https://bugs.launchpad.net/drizzle/+bug/357560)

Also shows bug with many less INSERT SELECT (making test run much quicker, especially if using transactional engine).

insert into foo( b ) values (1),(1),(1),(1),(1);
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;
insert into foo( b ) select b from foo;

is enough.
[19 Sep 2009 15:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83798

3117 Gleb Shchepa	2009-09-19
      Bug #44139: Table scan when NULL appears in IN clause
      
      SELECT ... WHERE ... IN (NULL, ...) does full table scan,
      even if the same query without the NULL uses efficient range scan.
      
      1. The collect_cmp_types() function has been modified to
         heading NULLs in the IN list.
      2. NULL-skipping code of the Item_func_in::fix_length_and_dec()
         function has been updated to prevent SEGVs.
     @ mysql-test/r/func_in.result
        Added test case for the bug #44139.
     @ mysql-test/t/func_in.test
        Added test case for the bug #44139.
     @ sql/item_cmpfunc.cc
        Bug #44139: Table scan when NULL appears in IN clause
        
        1. The collect_cmp_types() function has been modified to
           heading NULLs in the IN list.
        2. NULL-skipping code of the Item_func_in::fix_length_and_dec()
           function has been updated to prevent SEGVs.
[25 Sep 2009 16:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84691

3134 Gleb Shchepa	2009-09-25
      Bug #44139: Table scan when NULL appears in IN clause
      
      SELECT ... WHERE ... IN (NULL, ...) does full table scan,
      even if the same query without the NULL uses efficient range scan.
      
      The bugfix for the bug 18360 introduced an optimization:
      if
        1) all right-hand arguments of the IN function are constants
        2) result types of all right argument items are compatible
           enough to use the same single comparison function to
           compare all of them to the left argument,
      
      then
      
        we can convert the right-hand list of constant items to an array
        of equally-typed constant values for the further
        QUICK index access etc. (see Item_func_in::fix_length_and_dec()).
      
      The Item_null constant item objects have STRING_RESULT
      result types, so, as far as Item_func_in::fix_length_and_dec() 
      is aware of NULLs in the right list, this improvement efficiently 
      optimizes IN function calls with a mixed right list of NULLs and 
      string constants. However, the optimization doesn't affect mixed
      lists of NULLs and integers, floats etc., because there is no
      unique common comparator.
      
      
      New optimization has been added to ignore the result type
      of NULL constants in the static analysis of mixed right-hand lists.
      This is safe, because at the execution phase we care about
      presence of NULLs anyway.
      
      1. The collect_cmp_types() function has been modified to optionally
         ignore NULL constants in the item list.
      2. NULL-skipping code of the Item_func_in::fix_length_and_dec()
         function has been modified to work not only with in_string
         vectors but with in_vectors of other types.
     @ mysql-test/r/func_in.result
        Added test case for the bug #44139.
     @ mysql-test/t/func_in.test
        Added test case for the bug #44139.
     @ sql/item_cmpfunc.cc
        Bug #44139: Table scan when NULL appears in IN clause
        
        1. The collect_cmp_types() function has been modified to optionally
           ignore NULL constants in the item list.
        2. NULL-skipping code of the Item_func_in::fix_length_and_dec()
           function has been modified to work not only with in_string
           vectors but with in_vectors of other types.
[1 Oct 2009 11:24] Roy Lyseng
One slight problem with the bug fix is that it changes internal interfaces.
The bug is about incompatibility of NULL items with other items, because the NULL item is always assigned a result type of STRING_RESULT.
A better strategy might be to decorate NULL items with result type derived from the surrounding items, e.g we have "int_col in (NULL)", see that int_col is an INTEGER type, then decorate any NULL items on the right-hand side with the same result type.
One problem with the above solution that makes it impossible to implement as a bug fix is that it changes external interfaces. The query "SELECT int_col+NULL" is valid in MySQL (it is not in standard SQL), and it produces a column with result type DOUBLE. After the described fix, it would have type INTEGER. Having this interface change in a major release is probably acceptable.
There may also be other interface changes caused by such fix.
Please consider this strategy in a later release.
[2 Oct 2009 11:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85503

3134 Gleb Shchepa	2009-10-02
      Bug #44139: Table scan when NULL appears in IN clause
      
      SELECT ... WHERE ... IN (NULL, ...) does full table scan,
      even if the same query without the NULL uses efficient range scan.
      
      The bugfix for the bug 18360 introduced an optimization:
      if
        1) all right-hand arguments of the IN function are constants
        2) result types of all right argument items are compatible
           enough to use the same single comparison function to
           compare all of them to the left argument,
      
      then
      
        we can convert the right-hand list of constant items to an array
        of equally-typed constant values for the further
        QUICK index access etc. (see Item_func_in::fix_length_and_dec()).
      
      The Item_null constant item objects have STRING_RESULT
      result types, so, as far as Item_func_in::fix_length_and_dec() 
      is aware of NULLs in the right list, this improvement efficiently 
      optimizes IN function calls with a mixed right list of NULLs and 
      string constants. However, the optimization doesn't affect mixed
      lists of NULLs and integers, floats etc., because there is no
      unique common comparator.
      
      
      New optimization has been added to ignore the result type
      of NULL constants in the static analysis of mixed right-hand lists.
      This is safe, because at the execution phase we care about
      presence of NULLs anyway.
      
      1. The collect_cmp_types() function has been modified to optionally
         ignore NULL constants in the item list.
      2. NULL-skipping code of the Item_func_in::fix_length_and_dec()
         function has been modified to work not only with in_string
         vectors but with in_vectors of other types.
     @ mysql-test/r/func_in.result
        Added test case for the bug #44139.
     @ mysql-test/t/func_in.test
        Added test case for the bug #44139.
     @ sql/item_cmpfunc.cc
        Bug #44139: Table scan when NULL appears in IN clause
        
        1. The collect_cmp_types() function has been modified to optionally
           ignore NULL constants in the item list.
        2. NULL-skipping code of the Item_func_in::fix_length_and_dec()
           function has been modified to work not only with in_string
           vectors but with in_vectors of other types.
[2 Oct 2009 11:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85507

3134 Gleb Shchepa	2009-10-02
      Bug #44139: Table scan when NULL appears in IN clause
      
      SELECT ... WHERE ... IN (NULL, ...) does full table scan,
      even if the same query without the NULL uses efficient range scan.
      
      The bugfix for the bug 18360 introduced an optimization:
      if
        1) all right-hand arguments of the IN function are constants
        2) result types of all right argument items are compatible
           enough to use the same single comparison function to
           compare all of them to the left argument,
      
      then
      
        we can convert the right-hand list of constant items to an array
        of equally-typed constant values for the further
        QUICK index access etc. (see Item_func_in::fix_length_and_dec()).
      
      The Item_null constant item objects have STRING_RESULT
      result types, so, as far as Item_func_in::fix_length_and_dec() 
      is aware of NULLs in the right list, this improvement efficiently 
      optimizes IN function calls with a mixed right list of NULLs and 
      string constants. However, the optimization doesn't affect mixed
      lists of NULLs and integers, floats etc., because there is no
      unique common comparator.
      
      
      New optimization has been added to ignore the result type
      of NULL constants in the static analysis of mixed right-hand lists.
      This is safe, because at the execution phase we care about
      presence of NULLs anyway.
      
      1. The collect_cmp_types() function has been modified to optionally
         ignore NULL constants in the item list.
      2. NULL-skipping code of the Item_func_in::fix_length_and_dec()
         function has been modified to work not only with in_string
         vectors but with in_vectors of other types.
     @ mysql-test/r/func_in.result
        Added test case for the bug #44139.
     @ mysql-test/t/func_in.test
        Added test case for the bug #44139.
     @ sql/item_cmpfunc.cc
        Bug #44139: Table scan when NULL appears in IN clause
        
        1. The collect_cmp_types() function has been modified to optionally
           ignore NULL constants in the item list.
        2. NULL-skipping code of the Item_func_in::fix_length_and_dec()
           function has been modified to work not only with in_string
           vectors but with in_vectors of other types.
[5 Oct 2009 5:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85696

3153 Gleb Shchepa	2009-10-05
      Bug #44139: Table scan when NULL appears in IN clause
      
      SELECT ... WHERE ... IN (NULL, ...) does full table scan,
      even if the same query without the NULL uses efficient range scan.
      
      The bugfix for the bug 18360 introduced an optimization:
      if
        1) all right-hand arguments of the IN function are constants
        2) result types of all right argument items are compatible
           enough to use the same single comparison function to
           compare all of them to the left argument,
      
      then
      
        we can convert the right-hand list of constant items to an array
        of equally-typed constant values for the further
        QUICK index access etc. (see Item_func_in::fix_length_and_dec()).
      
      The Item_null constant item objects have STRING_RESULT
      result types, so, as far as Item_func_in::fix_length_and_dec()
      is aware of NULLs in the right list, this improvement efficiently
      optimizes IN function calls with a mixed right list of NULLs and
      string constants. However, the optimization doesn't affect mixed
      lists of NULLs and integers, floats etc., because there is no
      unique common comparator.
      
      
      New optimization has been added to ignore the result type
      of NULL constants in the static analysis of mixed right-hand lists.
      This is safe, because at the execution phase we care about
      presence of NULLs anyway.
      
      1. The collect_cmp_types() function has been modified to optionally
         ignore NULL constants in the item list.
      2. NULL-skipping code of the Item_func_in::fix_length_and_dec()
         function has been modified to work not only with in_string
         vectors but with in_vectors of other types.
     @ mysql-test/r/func_in.result
        Added test case for the bug #44139.
     @ mysql-test/t/func_in.test
        Added test case for the bug #44139.
     @ sql/item_cmpfunc.cc
        Bug #44139: Table scan when NULL appears in IN clause
        
        1. The collect_cmp_types() function has been modified to optionally
           ignore NULL constants in the item list.
        2. NULL-skipping code of the Item_func_in::fix_length_and_dec()
           function has been modified to work not only with in_string
           vectors but with in_vectors of other types.
[6 Oct 2009 9:00] Bugs System
Pushed into 5.1.40 (revid:joro@sun.com-20091006073316-lea2cpijh9r6on7c) (version source revid:gshchepa@mysql.com-20091005052736-lne27aw7xe7q3a74) (merge vers: 5.1.40) (pib:11)
[8 Oct 2009 0:29] Paul Dubois
Noted in 5.1.40 changelog.

SELECT ... WHERE ... IN (NULL, ...) was executed using a full table 
scan, even if the same query without the NULL used an efficient range
scan.   

Setting report to NDI pending push into 5.4.x.
[22 Oct 2009 6:36] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091022063126-l0qzirh9xyhp0bpc) (version source revid:alik@sun.com-20091019135554-s1pvptt6i750lfhv) (merge vers: 6.0.14-alpha) (pib:13)
[22 Oct 2009 7:09] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091022060553-znkmxm0g0gm6ckvw) (version source revid:alik@sun.com-20091013094238-g67x6tgdm9a7uik0) (merge vers: 5.5.0-beta) (pib:13)
[22 Oct 2009 19:39] Paul Dubois
Noted in 5.5.0, 6.0.14 changelogs.
[18 Dec 2009 10:39] Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20091218102229-64tk47xonu3dv6r6) (version source revid:jonas@mysql.com-20091218095730-26gwjidfsdw45dto) (merge vers: 5.1.41-ndb-7.1.0) (pib:15)
[18 Dec 2009 10:54] Bugs System
Pushed into 5.1.41-ndb-6.2.19 (revid:jonas@mysql.com-20091218100224-vtzr0fahhsuhjsmt) (version source revid:jonas@mysql.com-20091217101452-qwzyaig50w74xmye) (merge vers: 5.1.41-ndb-6.2.19) (pib:15)
[18 Dec 2009 11:09] Bugs System
Pushed into 5.1.41-ndb-6.3.31 (revid:jonas@mysql.com-20091218100616-75d9tek96o6ob6k0) (version source revid:jonas@mysql.com-20091217154335-290no45qdins5bwo) (merge vers: 5.1.41-ndb-6.3.31) (pib:15)
[18 Dec 2009 11:23] Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20091218101303-ga32mrnr15jsa606) (version source revid:jonas@mysql.com-20091218064304-ezreonykd9f4kelk) (merge vers: 5.1.41-ndb-7.0.11) (pib:15)
[18 Dec 2009 13:20] MC Brown
Already noted in earlier changelogs.
[15 Dec 2010 10:11] Nikolay Mihaylov
same if:

set @id = NULL;

explain select * from xxx where id = @id;

if thable is too large it do full table scan, if table is small it say "Impossible WHERE ..."