Description:
This is probably related to the impact of NULL values on equality/uniqueness tests.
To test equality against NULL, MySQL offers the <=> and IS NULL constructs. This is only applicable to singular column value constraints though.
For efficiently selecting rows where columns match any of multiple, specified value combinations, MySQL offers the IN construct. However, there is no equivalent of the <=> operator, as a NULL-inclusive equality check across a list of tuples. IN behaves as the regular equals operator, that is to say, it never returns TRUE on anything that contains a NULL value.
This behaviour requires significant added code and significant additional queries on batch lookups involving even the potential of NULL entries.
We are not sure at this point wether this violates anything in the SQL specification. Probably not.
How to repeat:
mysql> CREATE TABLE `tuple_test` (
-> `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
-> `a` INT UNSIGNED NULL,
-> `b` INT UNSIGNED NULL,
-> PRIMARY KEY (`id`),
-> UNIQUE INDEX `tuple` (`a`, `b`)
-> )
-> ENGINE=InnoDB;
Query OK, 0 rows affected (0.43 sec)
mysql> INSERT INTO `tuple_test` (`a`,`b`) VALUES (1,1), (1,NULL), (NULL,1), (NULL,NULL),(2,1), (1,2);
Query OK, 6 rows affected (0.07 sec)
Records: 6 Duplicates: 0 Warnings: 0
/* Confirm that all data has been inserted as specified */
mysql> SELECT * FROM `tuple_test`;
+----+------+------+
| id | a | b |
+----+------+------+
| 4 | NULL | NULL |
| 3 | NULL | 1 |
| 2 | 1 | NULL |
| 1 | 1 | 1 |
| 6 | 1 | 2 |
| 5 | 2 | 1 |
+----+------+------+
6 rows in set (0.00 sec)
/* Now we use the same set of values that were inserted into the table */
/* for a WHERE (<columns>) IN (<tuple-list>) query. */
/* We would expect to receive the full insert set from this select. */
/* Howerver, we observe that only rows consisting of purely non-NULL values */
/* can be selected this way. */
mysql> SELECT * FROM `tuple_test` WHERE (`a`,`b`) IN ((1,1), (1,NULL), (NULL,1), (NULL,NULL),(2,1), (1,2));
+----+------+------+
| id | a | b |
+----+------+------+
| 1 | 1 | 1 |
| 6 | 1 | 2 |
| 5 | 2 | 1 |
+----+------+------+
3 rows in set (0.00 sec)
Suggested fix:
Tame:
Offer an alternative spin on the IN operator that behaves like the <=> operator; either as a configuration option or a new syntactic extension.
Bolder:
Make the IN operator test with the <=> operator by default; it stands to reason that, when no combination involving NULL can ever lead to a TRUE result, even when explicitly embedded in the statement, or sourced from a table, the current behaviour destroys semantics willingly specified by the client, which it should not.
Description: This is probably related to the impact of NULL values on equality/uniqueness tests. To test equality against NULL, MySQL offers the <=> and IS NULL constructs. This is only applicable to singular column value constraints though. For efficiently selecting rows where columns match any of multiple, specified value combinations, MySQL offers the IN construct. However, there is no equivalent of the <=> operator, as a NULL-inclusive equality check across a list of tuples. IN behaves as the regular equals operator, that is to say, it never returns TRUE on anything that contains a NULL value. This behaviour requires significant added code and significant additional queries on batch lookups involving even the potential of NULL entries. We are not sure at this point wether this violates anything in the SQL specification. Probably not. How to repeat: mysql> CREATE TABLE `tuple_test` ( -> `id` INT UNSIGNED NOT NULL AUTO_INCREMENT, -> `a` INT UNSIGNED NULL, -> `b` INT UNSIGNED NULL, -> PRIMARY KEY (`id`), -> UNIQUE INDEX `tuple` (`a`, `b`) -> ) -> ENGINE=InnoDB; Query OK, 0 rows affected (0.43 sec) mysql> INSERT INTO `tuple_test` (`a`,`b`) VALUES (1,1), (1,NULL), (NULL,1), (NULL,NULL),(2,1), (1,2); Query OK, 6 rows affected (0.07 sec) Records: 6 Duplicates: 0 Warnings: 0 /* Confirm that all data has been inserted as specified */ mysql> SELECT * FROM `tuple_test`; +----+------+------+ | id | a | b | +----+------+------+ | 4 | NULL | NULL | | 3 | NULL | 1 | | 2 | 1 | NULL | | 1 | 1 | 1 | | 6 | 1 | 2 | | 5 | 2 | 1 | +----+------+------+ 6 rows in set (0.00 sec) /* Now we use the same set of values that were inserted into the table */ /* for a WHERE (<columns>) IN (<tuple-list>) query. */ /* We would expect to receive the full insert set from this select. */ /* Howerver, we observe that only rows consisting of purely non-NULL values */ /* can be selected this way. */ mysql> SELECT * FROM `tuple_test` WHERE (`a`,`b`) IN ((1,1), (1,NULL), (NULL,1), (NULL,NULL),(2,1), (1,2)); +----+------+------+ | id | a | b | +----+------+------+ | 1 | 1 | 1 | | 6 | 1 | 2 | | 5 | 2 | 1 | +----+------+------+ 3 rows in set (0.00 sec) Suggested fix: Tame: Offer an alternative spin on the IN operator that behaves like the <=> operator; either as a configuration option or a new syntactic extension. Bolder: Make the IN operator test with the <=> operator by default; it stands to reason that, when no combination involving NULL can ever lead to a TRUE result, even when explicitly embedded in the statement, or sourced from a table, the current behaviour destroys semantics willingly specified by the client, which it should not.