Bug #2093 optimizer fails if left join has is null condition (causes Not exist)
Submitted: 11 Dec 2003 4:04 Modified: 12 Dec 2003 9:07
Reporter: Martin Friebe (Gold Quality Contributor) (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.0.16 OS:FreeBSD (freebsd / possible others)
Assigned to: Alexey Botchkov CPU Architecture:Any

[11 Dec 2003 4:04] Martin Friebe
Description:
the mysql optimizer does fail in the following situation

 select *
 from t1
 left join t2  on t2.b = t1.a 
 left join t3  on t3.c = t1.a  and  t2.b is null

if t2.b is declared "not null", the optimizer will join in t2 as "Not exist", as if the "t2.b is null" was in the where clause.

The query should return t1 which either a row from t2 or t3 joined. so it should join all rows from t2, but it should only return rows from t3, if t2 did not return any row.

Currently the querry will only return rows for t1, that did not find rows in t2.

How to repeat:
 create table t1 (a1 int, a2 int);
 create table t2 (b1 int not null, b2 int); # not null is important for this colums
 create table t3 (c1 int, c2 int);

# you might index the columns for the join, but it doesnt affect the result

 insert into t1 values (1,2), (2,2), (3,2);
 insert into t2 values (1,3), (2,3);
 insert into t3 values (2,4),        (3,4);

 select * 
 from t1
 left join t2  on  b1 = a1
 left join t3  on  c1 = a1;

+------+------+------+------+------+------+
| a1   | a2   | b1   | b2   | c1   | c2   |
+------+------+------+------+------+------+
|    1 |    2 |    1 |    3 | NULL | NULL |
|    2 |    2 |    2 |    3 |    2 |    4 |
|    3 |    2 | NULL | NULL |    3 |    4 |
+------+------+------+------+------+------+

# adding the "b1 is null", I would expect c1, c2 to return null in the 2nd row

 select * 
 from t1
 left join t2  on  b1 = a1
 left join t3  on  c1 = a1  and  b1 is null;

+------+------+------+------+------+------+
| a1   | a2   | b1   | b2   | c1   | c2   |
+------+------+------+------+------+------+
|    3 |    2 | NULL | NULL |    3 |    4 |
+------+------+------+------+------+------+

# "b1 is null" has obviously gone to the where clause.
# the optimizer seems to look at the table declaration that says b1 cannot be null, but as it is the resul of a join, it can be null. (null rows for b1 are included in both results above)

 explain  select * 
 from t1
 left join t2  on  b1 = a1
 left join t3  on  c1 = a1  and  b1 is null;

+-------+------+---------------+------+---------+------+------+------------+
| table | type | possible_keys | key  | key_len | ref  | rows | Extra      |
+-------+------+---------------+------+---------+------+------+------------+
| t1    | ALL  | NULL          | NULL |    NULL | NULL |    3 |            |
| t2    | ALL  | NULL          | NULL |    NULL | NULL |    2 | Not exists |
| t3    | ALL  | NULL          | NULL |    NULL | NULL |    2 |            |
+-------+------+---------------+------+---------+------+------+------------+

Suggested fix:
#
[11 Dec 2003 4:18] Martin Friebe
Just to say:

if you say "and b1 = 2", instead of "and b1 is null"

the querry returns the correct 3 rows, only returning values from t3 on the correct row.
[11 Dec 2003 8:06] Dean Ellis
Verified against the current 4.0.17 development sources.

Thank you.
[12 Dec 2003 8:22] Alexey Botchkov
bk commit - 4.0 tree (hf:1.1647)