Bug #106718 Incorrect inner hash join when using materialization strategy
Submitted: 13 Mar 2022 4:27 Modified: 24 Mar 2022 22:23
Reporter: Xiu Tang Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.28 OS:Any
Assigned to: CPU Architecture:Any
Tags: inner hash join, materialization, semijoin

[13 Mar 2022 4:27] Xiu Tang
Description:
Incorrect inner hash join when using materialization strategy 

How to repeat:
CREATE TABLE `t0` (
  `c0` longtext
);

INSERT INTO `t0` VALUES ('321108437'),('-64596961'),('329053785'),('1983'),(NULL),(NULL),(''),('1936'),('-543970881'),('+nYDW'),(NULL),(NULL),('-1945919442'),('*Tv'),(NULL),('1230052719');

MySQL > SET optimizer_switch='default';
Query OK, 0 rows affected (0.00 sec)

MySQL > SELECT t0.c0 FROM t0 WHERE t0.c0 NOT IN (SELECT t0.c0 AS ref1 FROM t0 WHERE t0.c0 IN (SELECT t0.c0 FROM t0 WHERE (t0.c0 NOT IN (SELECT t0.c0 AS ref3 FROM t0 )) = (t0.c0) ) );
+-------------+
| c0          |
+-------------+
| 321108437   |
| -64596961   |
| 329053785   |
| 1983        |
| 1936        |
| -543970881  |
| -1945919442 |
| 1230052719  |
+-------------+
8 rows in set (0.01 sec)

MySQL > SET optimizer_switch='duplicateweedout=off';
Query OK, 0 rows affected (0.00 sec)

MySQL > SELECT t0.c0 FROM t0 WHERE t0.c0 NOT IN (SELECT t0.c0 AS ref1 FROM t0 WHERE t0.c0 IN (SELECT t0.c0 FROM t0 WHERE (t0.c0 NOT IN (SELECT t0.c0 AS ref3 FROM t0 )) = (t0.c0) ) );
Empty set (0.00 sec)

MySQL [testdb]> explain SELECT t0.c0 FROM t0 WHERE t0.c0 NOT IN (SELECT t0.c0 AS ref1 FROM t0 WHERE t0.c0 IN (SELECT t0.c0 FROM t0 WHERE (t0.c0 NOT IN (SELECT t0.c0 AS ref3 FROM t0 )) = (t0.c0) ) );
+----+--------------------+-------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type        | table       | p| id | select_type        | table       | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                      |
+----+--------------------+-------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
|  1 | PRIMARY            | t0          | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   16 |   100.00 | Using where                                |
|  2 | DEPENDENT SUBQUERY | <subquery3> | NULL       | ALL  | NULL          | NULL | NULL    | NULL | NULL |   100.00 | NULL                                       |
|  2 | DEPENDENT SUBQUERY | t0          | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   16 |    10.00 | Using where; Using join buffer (hash join) |
|  3 | MATERIALIZED       | t0          | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   16 |    10.00 | NULL                                       |
|  4 | DEPENDENT SUBQUERY | t0          | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   16 |   100.00 | Using where                                |
+----+--------------------+-------------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
5 rows in set, 1 warning (0.00 sec)

MySQL [testdb]> SET optimizer_switch='duplicateweedout=on';
Query OK, 0 rows affected (0.00 sec)

MySQL [testdb]> explain SELECT t0.c0 FROM t0 WHERE t0.c0 NOT IN (SELECT t0.c0 AS ref1 FROM t0 WHERE t0.c0 IN (SELECT t0.c0 FROM t0 WHERE (t0.c0 NOT IN (SELECT t0.c0 AS ref3 FROM t0 )) = (t0.c0) ) );
+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------------+
| id | select_type        | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                                     |
+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------------+
|  1 | PRIMARY            | t0    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   16 |   100.00 | Using where                                               |
|  2 | DEPENDENT SUBQUERY | t0    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   16 |    10.00 | Using where; Start temporary                              |
|  2 | DEPENDENT SUBQUERY | t0    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   16 |    10.00 | Using where; End temporary; Using join buffer (hash join) |
|  4 | DEPENDENT SUBQUERY | t0    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   16 |   100.00 | Using where                                               |
+----+--------------------+-------+------------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------------+
4 rows in set, 1 warning (0.00 sec)
[14 Mar 2022 10:36] MySQL Verification Team
Hello Xiu Tang,

Thank you for the report and test case.

regards,
Umesh
[24 Mar 2022 22:23] Jon Stephens
Documented fix as follows in the MySQL 8.0.30 changelog:

    In certain cases, incorrect results could result from execution
    of a semijoin with materialization, when the WHERE clause of the
    subquery contained an equality. In some cases, such as when one
    side of the equality was an IN or NOT IN subquery, the equality
    was neither pushed down to the materialized subquery, nor
    evaluated as part of the semijoin. This could also cause issues 
    with inner hash joins.

See also BUG#106710.

Closed.