Bug #8877 simple 'ref'-type join reads through all NULL values when looking for =
Submitted: 2 Mar 2005 0:26 Modified: 20 Nov 2010 23:26
Reporter: Timothy Smith Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:any OS:Any (any)
Assigned to: CPU Architecture:Any

[2 Mar 2005 0:26] Timothy Smith
Description:
When joining two tables, both with many NULL values in the indexed field, an = or <> condition compares all of the rows.

It seems that it should skip all but the first NULL row in the second table of the join.

How to repeat:
create table a (x int, key (x)); create table b (x int, key (x));

insert into a values (NULL);

insert into a select NULL from a;

-- Hit <Up><Enter> a few times to get 1024 rows in 'a'

insert into b select NULL from a limit 100;

flush status;

select count(*) from a, b where a.x = b.x;

show status like 'Handler_read_%';
+-----------------------+--------+
| Variable_name         | Value  |
+-----------------------+--------+
| Handler_read_first    | 1      |
| Handler_read_key      | 100    |
| Handler_read_next     | 102500 |
| Handler_read_prev     | 0      |
| Handler_read_rnd      | 0      |
| Handler_read_rnd_next | 0      |
+-----------------------+--------+

If there are several thousand rows with NULL values in both tables, this query can take a very long time to return.
[2 Mar 2005 0:29] Timothy Smith
By the way, I tested this with both MyISAM and InnoDB, using MySQL 4.1.10 on Linux.  The customer tested it on several versions of 4.0 as well as on 4.1.10.
[26 Mar 2005 14:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/23386
[27 Mar 2005 6:36] Sergey Petrunya
The patch was rejected, will work on requested changes
[2 Apr 2005 0:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/23575
[4 Apr 2005 22:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/23637
[7 Apr 2005 13:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/23757
[7 Apr 2005 16:10] Sergey Petrunya
Fixed in 4.1.12 tree (and pushed to 5.0.4)
[7 Apr 2005 16:46] Sergey Petrunya
Notes for the docs team:
The section 7.2.1 of the manual says 
 "ref can be used for indexed columns that are compared using the = operator".
Maybe it's worth to mention that null-aware '<=>' can be used here too.

The fix for this bug does the following:
If the ref access method uses "tbl.key = othertbl.field"  MySQL is able to figure that this also implies that "othertbl.field IS NOT NULL" and adds this to table condition of table othertbl.
The added "othertbl.field IS NOT NULL" predicate can be only applied after a record has been retrieved from the table othertbl.
[13 Apr 2005 14:14] Patrick Allaert
I am experiencing a very similar problem with:

CREATE TABLE a (
id int(10) unsigned default NULL
);

CREATE TABLE b (
id int(10) unsigned default NULL,
UNIQUE KEY id_key (id)
)

INSERT INTO a VALUES (1), (2), (3), (4), (5), (6), (NULL), (NULL), (NULL);

INSERT INTO b VALUES (1), (2), (3), (4), (5), (6), (7), (8), (NULL), (NULL), (NULL), (NULL), (NULL), ..., (NULL);

Launching this query:

SELECT * FROM a, b WHERE a.id = b.id;

will take a long long time (directly proportional to the number of a.id NULL values until I delete them, the time is then near 0s).

Launching this query:

SELECT * FROM a, b WHERE a.id = b.id WHERE a.id IS NOT NULL;

Is very fast... But in my case, I have to retrieve ALL the rows from table a, so, I use the following left join query:

SELECT * FROM a LEFT JOIN b ON a.id = b.id;

In this case, does the patch optimize correctly the query AND returns NULL values of a.id ?

Patrick Allaert
[28 Apr 2005 15:10] Paul DuBois
Noted in 4.1.12, 5.0.4 changelogs.
[11 Dec 2009 6:02] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091211055901-yp18b3c7xuhl87rf) (version source revid:alik@sun.com-20091211055401-43rjwq7gjed6ds83) (merge vers: 6.0.14-alpha) (pib:13)
[15 Dec 2009 3:08] Paul DuBois
Noted in 6.0.14 changelog.
[16 Aug 2010 6:33] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:11] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)