MySQL Bugs: #67411: NULL breaks "WHERE (columns) IN ((tuple), (tuple))"

Bug #67411	NULL breaks "WHERE (columns) IN ((tuple), (tuple))"
Submitted:	29 Oct 2012 13:54	Modified:	29 Jan 2013 9:18
Reporter:	Rolf Neuberger	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S4 (Feature request)
Version:	5.5.24	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
This is probably related to the impact of NULL values on equality/uniqueness tests.

To test equality against NULL, MySQL offers the <=> and IS NULL constructs. This is only applicable to singular column value constraints though.

For efficiently selecting rows where columns match any of multiple, specified value combinations, MySQL offers the IN construct. However, there is no equivalent of the <=> operator, as a NULL-inclusive equality check across a list of tuples. IN behaves as the regular equals operator, that is to say, it never returns TRUE on anything that contains a NULL value.

This behaviour requires significant added code and significant additional queries on batch lookups involving even the potential of NULL entries.

We are not sure at this point wether this violates anything in the SQL specification. Probably not.

How to repeat:
mysql> CREATE TABLE `tuple_test` (
    -> `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
    -> `a` INT UNSIGNED NULL,
    -> `b` INT UNSIGNED NULL,
    -> PRIMARY KEY (`id`),
    -> UNIQUE INDEX `tuple` (`a`, `b`)
    -> )
    -> ENGINE=InnoDB;
Query OK, 0 rows affected (0.43 sec)

mysql> INSERT INTO `tuple_test` (`a`,`b`) VALUES (1,1), (1,NULL), (NULL,1), (NULL,NULL),(2,1), (1,2);
Query OK, 6 rows affected (0.07 sec)
Records: 6  Duplicates: 0  Warnings: 0

/* Confirm that all data has been inserted as specified */
mysql> SELECT * FROM `tuple_test`;
+----+------+------+
| id | a    | b    |
+----+------+------+
|  4 | NULL | NULL |
|  3 | NULL |    1 |
|  2 |    1 | NULL |
|  1 |    1 |    1 |
|  6 |    1 |    2 |
|  5 |    2 |    1 |
+----+------+------+
6 rows in set (0.00 sec)

/* Now we use the same set of values that were inserted into the table */
/* for a WHERE (<columns>) IN (<tuple-list>) query. */
/* We would expect to receive the full insert set from this select. */
/* Howerver, we observe that only rows consisting of purely non-NULL values */
/* can be selected this way. */

mysql> SELECT * FROM `tuple_test` WHERE (`a`,`b`) IN ((1,1), (1,NULL), (NULL,1), (NULL,NULL),(2,1), (1,2));
+----+------+------+
| id | a    | b    |
+----+------+------+
|  1 |    1 |    1 |
|  6 |    1 |    2 |
|  5 |    2 |    1 |
+----+------+------+
3 rows in set (0.00 sec)

Suggested fix:
Tame:
Offer an alternative spin on the IN operator that behaves like the <=> operator; either as a configuration option or a new syntactic extension.

Bolder:
Make the IN operator test with the <=> operator by default; it stands to reason that, when no combination involving NULL can ever lead to a TRUE result, even when explicitly embedded in the statement, or sourced from a table, the current behaviour destroys semantics willingly specified by the client, which it should not.

Thank you for the feature request.

Let me say first that your "bold" suggestion has little chance of being accepted.

 - The current behavior is according to the SQL standard.
 - We will break compatibility with existing applications if we implement this.
 - We will also break compatibility with other DBMS's if we implement this.

The reason that standard SQL does not allow this is that NULL is not a "value", but rather an indication of something that is unknown.

Your "tame" suggestion is a syntax extension, so we are free to implement what we want here, as long as it is reasonably consistent with SQL. IN is a synonym to =ANY, so we could use an operator named <=>ANY
to implement this extension, like this:

  ... WHERE (a, b) <=>ANY ((1,1), (1,NULL), (NULL,1), (NULL,NULL),(2,1), (1,2))

 However, this might turn out to be a very large effort, as we have a lot of code that deals with proper results of subqueries.