MySQL Bugs: #25407: MySQL prefers UNIQUE index for IS NULL conditions over more selective conditions

Bug #25407	MySQL prefers UNIQUE index for IS NULL conditions over more selective conditions
Submitted:	4 Jan 2007 10:31	Modified:	8 Feb 2007 18:09
Reporter:	Kai Ruhnau	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	5.0.34-BK, 5.0.26	OS:	Linux (Gentoo Linux)
Assigned to:	Igor Babaev	CPU Architecture:	Any

Description:
I have a table with roughly 70,000 entries.
One column (`ID_with_null`) has exactly 3 entries that are NOT NULL and a UNIQUE index over this column.
The other column (`ID_better`) is also indexed.

mysql> SELECT COUNT(*) FROM my_table WHERE ID_with_null IS NULL;
+----------+
| COUNT(*) |
+----------+
|    69581 |
+----------+

mysql> SELECT COUNT(*) FROM my_table WHERE ID_better=1;
+----------+
| COUNT(*) |
+----------+
|      251 |
+----------+

mysql> EXPLAIN SELECT * FROM my_table WHERE ID_with_null IS NULL AND ID_better=1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: my_table
         type: ref
possible_keys: ID_with_null,ID_better
          key: ID_with_null
      key_len: 5
          ref: const
         rows: 1
        Extra: Using where

As you can see, MySQL assumes, that the UNIQUE index will only return one row, although that's not true when NULL values are stored in the UNIQUE column.

Forcing the usage of the index over ID_better greatly improves the query execution.

How to repeat:
CREATE TABLE my_table (
ID_with_null INT(10) NULL DEFAULT NULL,
ID_better INT(10) NOT NULL,
UNIQUE INDEX (ID_with_null),
INDEX (ID_better)
);

Fill the table with some values. MySQL will always use the UNIQUE index for conditions "ID_with_null IS NULL".

Suggested fix:
The optimizer should treat UNIQUE indexes with NULL values different.

Thank you for a bug report. Verified just as dfescribed with latest 5.0.34-BK on Linux:

mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.0.34-debug |
+--------------+
1 row in set (0.00 sec)

mysql> CREATE TABLE my_table (
    -> ID_with_null INT(10) NULL DEFAULT NULL,
    -> ID_better INT(10) NOT NULL,
    -> UNIQUE INDEX (ID_with_null),
    -> INDEX (ID_better)
    -> );
Query OK, 0 rows affected (0.02 sec)

mysql> insert into my_table values(1, 1);
Query OK, 1 row affected (0.00 sec)

mysql> insert into my_table values(2, 1);
Query OK, 1 row affected (0.00 sec)

mysql> insert into my_table values(NULL, 3);
Query OK, 1 row affected (0.00 sec)

mysql> insert into my_table select NULL, 3 from my_table;
Query OK, 3 rows affected (0.02 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> insert into my_table select * from my_table where ID_with_null IS NULL;
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> insert into my_table select * from my_table where ID_with_null IS NULL;
Query OK, 8 rows affected (0.01 sec)
Records: 8  Duplicates: 0  Warnings: 0

...

mysql> insert into my_table select * from my_table where ID_with_null IS NULL;
Query OK, 16384 rows affected (0.91 sec)
Records: 16384  Duplicates: 0  Warnings: 0

mysql> analyze table my_table;
+---------------+---------+----------+----------+
| Table         | Op      | Msg_type | Msg_text |
+---------------+---------+----------+----------+
| test.my_table | analyze | status   | OK       |
+---------------+---------+----------+----------+
1 row in set (0.03 sec)

mysql> SELECT COUNT(*) FROM my_table WHERE ID_with_null IS NULL;
+----------+
| COUNT(*) |
+----------+
|    32768 |
+----------+
1 row in set (0.06 sec)

mysql> SELECT COUNT(*) FROM my_table WHERE ID_better=1;
+----------+
| COUNT(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM my_table WHERE ID_better=1 AND ID_with_null IS NUL
L;
+----+-------------+----------+------+------------------------+--------------+--
-------+-------+------+-------------+
| id | select_type | table    | type | possible_keys          | key          | k
ey_len | ref   | rows | Extra       |
+----+-------------+----------+------+------------------------+--------------+--
-------+-------+------+-------------+
|  1 | SIMPLE      | my_table | ref  | ID_with_null,ID_better | ID_with_null | 5
       | const |    1 | Using where |
+----+-------------+----------+------+------------------------+--------------+--
-------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM my_table WHERE ID_better=1 AND ID_with_null IS NULL;
Empty set (0.18 sec)

mysql> SELECT * FROM my_table FORCE INDEX (ID_better) WHERE ID_better=1 AND ID_
with_null IS NULL;
Empty set (0.01 sec)

So, optimizer's decision to use that UNIQUE index is obviously non-optimal.

We have a wrong choice of indexes in the reported test case due to a wrong
estimate for the number of rows with ID_with_null=NULL (what we can we see from
the EXPLAIN output).
This wrong estimate, in its turn, is the result of an assumption that any key may occur in an unique index only once, which is not true for keys with nulls that can have multiple occurrences in the index.

This bug is present in 4.0/4.1 code as well.
However, it won't be fixed there as the fix affects the choice of the execution plan for some queries.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19059

ChangeSet@1.2392, 2007-01-30 19:30:08-08:00, igor@olga.mysql.com +5 -0
  Fixed bug #25407.
  The bug coud cause choosing a sub-optimal execution plan for a single-table query
  if a unuque index with many null keys were defined for the table. 
  It happened because the code of the check_quick_keys function made an assumption
  that any key may occur in an unique index only once. Yet this is not true for keys
  with nulls that may have multiple occurrences in the index.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19148

ChangeSet@1.2396, 2007-01-31 17:12:16-08:00, igor@olga.mysql.com +4 -0
  Fixed bug #25407.
  The bug could cause choosing a sub-optimal execution plan for 
  a single-table query if a unique index with many null keys were 
  defined for the table. 
  It happened because the code of the check_quick_keys function 
  made an assumption that any key may occur in an unique index 
  only once. Yet this is not true for keys with nulls that may 
  have multiple occurrences in the index.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19151

ChangeSet@1.2396, 2007-01-31 19:31:36-08:00, igor@olga.mysql.com +4 -0
  Fixed bug #25407.
  The bug could cause choosing a sub-optimal execution plan for 
  a single-table query if a unique index with many null keys were 
  defined for the table. 
  It happened because the code of the check_quick_keys function 
  made an assumption that any key may occur in an unique index 
  only once. Yet this is not true for keys with nulls that may 
  have multiple occurrences in the index.

The fix has been pushed into 5.0.36, 5.1.16-beta main trees.

Noted in 5.0.36, 5.1.16 changelogs.

For a UNIQUE index containing many NULL values, the optimizer would
prefer the index for col IS NULL conditions over other more selective
indexes.

How does this cooperate with the setting of myisam_stats_method 'ignore_null'?

Is feature request Bug 19327 fixed with this Bug as well, since the optimizer needs to count NULL values separately to not choose the mentioned index?