Bug #88670 Subquery incorrectly shows duplicate values on subqueries.
Submitted: 27 Nov 2017 22:56 Modified: 17 Aug 23:38
Reporter: Mark El-Wakil Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.7.20, 5.6.38 OS:Any
Assigned to: CPU Architecture:Any

[27 Nov 2017 22:56] Mark El-Wakil
It appears there is an issue with how MySQL deals with subqueries when the subquery is a unique index.

If you look at the below example:

SELECT p.Id FROM (p) WHERE p.Id IN (select s.Id from s WHERE Id=1 AND u IS NULL) ORDER BY Id desc;

The subquery (table s) is looking at a query that utilizes the index `o`. This query is a unique key.

However, since the unique key allows for NULL values, MySQL allows for duplicate instances of NULL to exist in this table.

In this case, the select query will return multiple values, even though table `p` only has one row.

This does not happen on older instances of MySQL, such a MySQL 5.5.

How to repeat:

CREATE TABLE `p` (`Id` int(11),PRIMARY KEY (`Id`));
CREATE TABLE `s` (`Id` int(11), `u` int(11), UNIQUE KEY o(Id, u) );

insert into s VALUES (1, NULL),(1, NULL);
insert into p VALUES (1);

SELECT p.Id FROM (p) WHERE p.Id IN (select s.Id from s WHERE Id=1 AND u IS NULL) ORDER BY Id desc;

Note: If the subquery does not utilize this index, only one value is returned:

SELECT p.Id FROM (p) WHERE p.Id IN (select s.Id from s) ORDER BY Id desc;

Suggested fix:
Since this does not happen in MySQL 5.5, I recommend adding an additional check to see whether or not to rewrite the subquery as a join based on whether or not the unique index can contain null values (and therefore, duplicates).

Please let me know if this is unclear, or if there is any other way I can demonstrate this issue.
[28 Nov 2017 6:01] Umesh Shastry
Hello Mark El-Wakil,

Thank you for the report and test case.

[17 Aug 23:38] Jon Stephens
Documented fix as follows in the MySQL 5.6.42, 5.7.24, and 8.0.13 changelogs:

    It was possible for a subquery that used a unique key on a
    column allowing NULL to return multiple rows.