Bug #48046 Server incorrectly processing JOINs on NULL values
Submitted: 14 Oct 2009 16:27 Modified: 20 Dec 2010 13:21
Reporter: Patrick Crews Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: null, OUTER JOIN

[14 Oct 2009 16:27] Patrick Crews
Description:
The server appears to be processing JOINs on NULL values incorrectly.
Both 5.1 and 6.0-codebase are exhibiting what seems to be bad behavior, but 6.0-codebase appears to be worse:

As can be seen below, the table in question has a single row, with NULL values in the ON-clause's columns.

5.1 at least uses the valid pk value while 6.0 seems to be 'making up' a NULL.

However, it should be noted that in 5.1.34, doing SELECT NULL=NULL, returns NULL and performing a similar query (JOIN on columns with only NULL values) results in an empty set...

This behavior does not appear to be affected by join_cache_level or engine_condition_pushdown settings.

For this table:
CREATE TABLE `BB` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `time_key` time DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  PRIMARY KEY (`pk`),
  KEY `time_key` (`time_key`),
  KEY `varchar_key` (`varchar_key`)
) ENGINE=MyISAM AUTO_INCREMENT=11 DEFAULT CHARSET=latin1;
INSERT INTO `BB` VALUES (10,'18:27:58',NULL,NULL);

This query:

SELECT table1 .`time_key` field1  , table2 .`pk` 
FROM BB table1  LEFT  JOIN BB table2  ON table2 .`varchar_nokey`  = table1 .`varchar_key`
HAVING field1   ;

Produces this diff:
#/* Diff: */

#/* --- /tmp//randgen29184-1255519418-server0.dump      2009-10-14 07:23:38.000000000 -0400
# +++ /tmp//randgen29184-1255519418-server1.dump        2009-10-14 07:23:38.000000000 -0400
# @@ -1 +1 @@
# -18:27:58     NULL
# +18:27:58     10 */

How to repeat:
MTR test case:

#/* Server0: MySQL 6.0.14-alpha-gcov-debug-log */
#/* Server1: MySQL 5.1.40-debug-log */

#/* The value of optimizer_switch is distinct between the two servers: */
SET SESSION optimizer_switch = 'firstmatch=off,index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,loosescan=off,materialization=off,semijoin=off' ;
#Server 1 : SET SESSION optimizer_switch = 'index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on' ;

#/* The value of optimizer_use_mrr is distinct between the two servers: */
SET SESSION optimizer_use_mrr = 'disable' ;

#/* The value of engine_condition_pushdown is common between the two servers: */
/*!50400 SET SESSION engine_condition_pushdown = 1 */;

#/* The value of join_cache_level is distinct between the two servers: */
SET SESSION join_cache_level = 1; 

#/* The value of debug is distinct between the two servers: */
SET SESSION debug = '+d,optimizer_no_icp' ;

#/* Begin test case for query 0 */

--disable_warnings
DROP TABLE /*! IF EXISTS */ BB;
--enable_warnings

CREATE TABLE `BB` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `time_key` time DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  PRIMARY KEY (`pk`),
  KEY `time_key` (`time_key`),
  KEY `varchar_key` (`varchar_key`)
) ENGINE=MyISAM AUTO_INCREMENT=11 DEFAULT CHARSET=latin1;
INSERT INTO `BB` VALUES (10,'18:27:58',NULL,NULL);

 
SELECT table1 .`time_key` field1  , table2 .`pk`  
FROM BB table1  LEFT  JOIN BB table2  ON table2 .`varchar_nokey`  = table1 .`varchar_key`  
HAVING field1   ;

#/* Diff: */

#/* --- /tmp//randgen29184-1255519418-server0.dump	2009-10-14 07:23:38.000000000 -0400
# +++ /tmp//randgen29184-1255519418-server1.dump	2009-10-14 07:23:38.000000000 -0400
# @@ -1 +1 @@
# -18:27:58	NULL
# +18:27:58	10 */

DROP TABLE BB;
#/* End of test case for query 0 */

#/* Begin test case for query 1 */

--disable_warnings
DROP TABLE /*! IF EXISTS */ BB;
--enable_warnings

CREATE TABLE `BB` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_nokey` int(11) DEFAULT NULL,
  `time_key` time DEFAULT NULL,
  `datetime_key` datetime DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  PRIMARY KEY (`pk`),
  KEY `time_key` (`time_key`),
  KEY `datetime_key` (`datetime_key`),
  KEY `varchar_key` (`varchar_key`)
) ENGINE=MyISAM AUTO_INCREMENT=11 DEFAULT CHARSET=latin1;
INSERT INTO `BB` VALUES (10,8,'18:27:58','1900-01-01 00:00:00',NULL,NULL);

 
SELECT    table1 . `time_key` AS field1 , table2 . `pk` AS field2 , table2 . `int_nokey` AS field3 , table1 . `varchar_nokey` AS field4 , table2 . `datetime_key` AS field5 
FROM ( BB AS table1 LEFT  JOIN BB AS table2 ON (table2 . `varchar_nokey` = table1 . `varchar_key` ) )   
GROUP BY field1, field2, field3, field4, field5 
HAVING field1 <= 'y' 
ORDER BY table1 . `time_key`  , field2 DESC;

#/* Diff: */

#/* --- /tmp//randgen29184-1255519418-server0.dump	2009-10-14 07:23:38.000000000 -0400
# +++ /tmp//randgen29184-1255519418-server1.dump	2009-10-14 07:23:38.000000000 -0400
# @@ -1 +1 @@
# -18:27:58	NULL	NULL	NULL	NULL
# +18:27:58	10	8	NULL	1900-01-01 00:00:00 */

DROP TABLE BB;
#/* End of test case for query 1 */

Suggested fix:
Ensure correct query processing.
[15 Oct 2009 13:46] Øystein Grøvlen
Assigning this to me since it looks very similar to Bug#47217.
[19 Oct 2009 15:39] Patrick Crews
EXPLAIN output:

Query 0:
6.0:
EXPLAIN SELECT table1 .`time_key` field1  , table2 .`pk`  
FROM BB table1  LEFT  JOIN BB table2  ON table2 .`varchar_nokey`  = table1 .`varchar_key` 
HAVING field1   ;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	table1	system	NULL	NULL	NULL	NULL	1	
1	SIMPLE	table2	system	NULL	NULL	NULL	NULL	1

5.1:
EXPLAIN SELECT table1 .`time_key` field1  , table2 .`pk`  
FROM BB table1  LEFT  JOIN BB table2  ON table2 .`varchar_nokey`  = table1 .`varchar_key` 
HAVING field1   ;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	table1	system	NULL	NULL	NULL	NULL	1	
1	SIMPLE	table2	system	NULL	NULL	NULL	NULL	1

Query 1:
6.0:
EXPLAIN SELECT    table1 . `time_key` AS field1 , table2 . `pk` AS field2 , table2 . `int_nokey`
AS field3 , table1 . `varchar_nokey` AS field4 , table2 . `datetime_key` AS field5 
FROM ( BB AS table1 LEFT  JOIN BB AS table2 ON (table2 . `varchar_nokey` = table1 .
`varchar_key` ) )   
GROUP BY field1, field2, field3, field4, field5 
HAVING field1 <= 'y' 
ORDER BY table1 . `time_key`  , field2 DESC;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	table1	system	NULL	NULL	NULL	NULL	1	
1	SIMPLE	table2	system	NULL	NULL	NULL	NULL	1	
Warnings:
Warning	1292	Incorrect time value: 'y' for column 'time_key' at row 1

5.1:
EXPLAIN SELECT    table1 . `time_key` AS field1 , table2 . `pk` AS field2 , table2 . `int_nokey`
AS field3 , table1 . `varchar_nokey` AS field4 , table2 . `datetime_key` AS field5 
FROM ( BB AS table1 LEFT  JOIN BB AS table2 ON (table2 . `varchar_nokey` = table1 .
`varchar_key` ) )   
GROUP BY field1, field2, field3, field4, field5 
HAVING field1 <= 'y' 
ORDER BY table1 . `time_key`  , field2 DESC;
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	table1	system	NULL	NULL	NULL	NULL	1	
1	SIMPLE	table2	system	NULL	NULL	NULL	NULL	1	
Warnings:
Warning	1292	Incorrect time value: 'y' for column 'time_key' at row 1
[19 Nov 2009 11:54] Øystein Grøvlen
6.0 shows the correct behavior.

The query is a left outer join between tow tables.  Wikipedia says the following
about left outer joins:

"The result of a left outer join (or simply left join) for table A and B always contains all records of the "left" table (A), even if the join-condition does not find any matching record in the "right" table (B). This means that if the ON clause matches 0 (zero) records in B, the join will still return a row in the result—but with NULL in each column from B. This means that a left outer join returns all the values from the left table, plus matched values from the right table (or NULL in case of no matching join predicate)."

In other words, when the join column of the left is NULL, it will not find any mathcing rows, and the output columns of the right hand table should all be NULL.

This means that the 5.1 behavior is wrong.  This issue is probably related to use of an index.  I notice that if I switch to join a non-indexed column of the left table with an indexed column of the right table, one get the right result
(i.e., NULL for table2.pk).
[6 Dec 2010 22:55] Ole John Aske
This seems to be a duplicate of bug#57034 which I have recently commited a fix for. Has tested the testcases in this report with my fix and the reported problems seems to have been solved.
[20 Dec 2010 13:21] Ole John Aske
Closed as duplicate of bug#57034