MySQL Bugs: #53793: "early NULL filtering" not done for outer join converted to inner join

Bug #53793	"early NULL filtering" not done for outer join converted to inner join
Submitted:	19 May 2010 9:52	Modified:	19 May 2010 12:23
Reporter:	Guilhem Bichot	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	6.0-codebase-bugfixing	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Didn't test with earlier versions.

I'm using revision-id:alik@sun.com-20100514053852-r4baos2r9tvpbz9y

In MySQL there is this:
http://forge.mysql.com/wiki/MySQL_Internals_Optimizer#Early_NULLs_Filtering

Consider this query:

SELECT table1.pk, table2.col_int_key, table1.col_int_key, table2.pk 
FROM C table1 JOIN C table2  ON table2 .`col_int_key`  = table1 .`col_int_key`
WHERE table2 .`pk`;

table1 is accessed first, then table2 is accessed with "ref". The ON condition implies that if a row of table1 has NULL in table1.col_int_key, no access to table2 is needed for this row (no row from table2 will match the ON): this is early NULLs filtering: MySQL automatically pushes to table1 a "table1.col_int_key IS NOT NULL" condition.
This is seen in EXPLAIN:
id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	table1	ALL	col_int_key	NULL	NULL	NULL	20	100.00	Using where
1	SIMPLE	table2	ref	col_int_key	col_int_key	5	test.table1.col_int_key	3	100.00	Using where
the pushed "IS NOT NULL" condition is the "Using where" in the line of table1.
So far so good.
Now consider the following query, where JOIN became LEFT JOIN:
SELECT table1.pk, table2.col_int_key, table1.col_int_key, table2.pk 
FROM C table1  LEFT  JOIN C table2  ON table2 .`col_int_key`  = table1 .`col_int_key`
WHERE table2 .`pk`;
Because of the WHERE clause, MySQL understands that the LEFT JOIN can be replaced by JOIN (inner join) (indeed if there is not match for a row of table1, the LEFT JOIN will produce a NULL value for table2.pk, which won't match the WHERE clause). So MySQL rewrites it to be like the first query; but then it doesn't do early NULLs filtering:
1	SIMPLE	table1	ALL	col_int_key	NULL	NULL	NULL	20	100.00	
1	SIMPLE	table2	ref	col_int_key	col_int_key	5	test.table1.col_int_key	3	100.00	Using where
(no "Using where" on first line). That is inefficient (misses an opportunity to eliminate lookups) and illogical (as the LEFT JOIN query is converted to a JOIN query, it should benefit from the same advantages).

How to repeat:
CREATE TABLE `C` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `col_int_key` int(11) DEFAULT NULL,
  `col_varchar_key` varchar(1) DEFAULT NULL,
  PRIMARY KEY (`pk`),
  KEY `col_int_key` (`col_int_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=MyISAM;
INSERT INTO `C` VALUES (1,2,'w');
INSERT INTO `C` VALUES (2,9,'m');
INSERT INTO `C` VALUES (3,3,'m');
INSERT INTO `C` VALUES (4,9,'k');
INSERT INTO `C` VALUES (5,NULL,'r');
INSERT INTO `C` VALUES (6,9,'t');
INSERT INTO `C` VALUES (7,3,'j');
INSERT INTO `C` VALUES (8,8,'u');
INSERT INTO `C` VALUES (9,8,'h');
INSERT INTO `C` VALUES (10,53,'o');
INSERT INTO `C` VALUES (11,0,NULL);
INSERT INTO `C` VALUES (12,5,'k');
INSERT INTO `C` VALUES (13,166,'e');
INSERT INTO `C` VALUES (14,3,'n');
INSERT INTO `C` VALUES (15,0,'t');
INSERT INTO `C` VALUES (16,1,'c');
INSERT INTO `C` VALUES (17,9,'m');
INSERT INTO `C` VALUES (18,5,'y');
INSERT INTO `C` VALUES (19,6,'f');
INSERT INTO `C` VALUES (20,2,'d');

explain extended SELECT table1.pk, table2.col_int_key, table1.col_int_key, table2.pk 
FROM C table1  LEFT  JOIN C table2  ON table2 .`col_int_key`  = table1 .`col_int_key`
WHERE table2 .`pk`;

explain extended SELECT table1.pk, table2.col_int_key, table1.col_int_key, table2.pk 
FROM C table1  JOIN C table2  ON table2 .`col_int_key`  = table1 .`col_int_key`
WHERE table2 .`pk`;

drop table C;

Suggested fix:
When we process the LEFT JOIN query:
- it's a LEFT JOIN, so we set table->outer_join to "true"
- in JOIN::prepare(), we see table->outer_join is "true" so set table->null_row to "true" (ok)
- in simplify_joins() we convert to inner join, so set table->outer_join to "false", we forget to reconsider a new value for table->null_row (bad)
- in add_not_null_conds(), we see table->null_row is "true", so don't add "IS NOT NULL" condition

Thank you for the bug report.

in next-mr-bugfixing aelkin@mysql.com-20100518084747-kvvy483pn3n3y4tv there is no "Using where" even in the JOIN case. But I see that add_not_null_conds(), for the JOIN case, does add a IS NOT NULL condition, looks like it gets lost afterwards?
This would deserve investigation as well.

sorry, replace "null_row" by "maybe_null"

If I do
prepare stmt from 'explain extended SELECT table1.pk, table2.col_int_key, table1.col_int_key, table2.pk  FROM C table1  LEFT  JOIN C table2  ON table2 .`col_int_key`  = table1 .`col_int_key` WHERE table2 .`pk`';
execute stmt;
execute stmt;
the bug is seen in the first EXECUTE but not the second.
It could well be related to http://bugs.mysql.com/bug.php?id=54484

Posted by developer:
 
True in 5.5 and newer.
Current testcase:
create table t(a int, b int, key(b));

explain format=json select * from t t1 join t t2 on t2.b=t1.a where t2.a is not null;
-> for t1 we see:         "attached_condition": "(`test`.`t1`.`a` is not null)"

explain format=json select * from t t1 left join t t2 on t2.b=t1.a where t2.a is not null;
This is converted to inner join:
Note (Code 1003): /* select#1 */ select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b` from `test`.`t` `t1` join `test`.`t` `t2` where ((`test`.`t2`.`b` = `test`.`t1`.`a`) and (`test`.`t2`.`a` is not null))
but no attached_condition on t1.