Bug #119537 The same SQL returns inconsistent results when toggling materialization on and off
Submitted: 11 Dec 1:47 Modified: 12 Dec 13:28
Reporter: Ceit Zhang Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.43+ OS:Any
Assigned to: CPU Architecture:Any

[11 Dec 1:47] Ceit Zhang
Description:
For a SQL with IN (subquery), mysql returns inconsistent results when toggling materialization on and off.

How to repeat:
mysql> show create table t1;
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                         |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `col_varchar(20)_key_signed` varchar(20) CHARACTER SET utf8mb3 DEFAULT NULL
) ENGINE=Dstore DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show create table t2;
+-------+----------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                           |
+-------+----------------------------------------------------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
  `col_double_undef_signed` double DEFAULT NULL
) ENGINE=Dstore DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+----------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select * from t1;
+----------------------------+
| col_varchar(20)_key_signed |
+----------------------------+
| -0                         |
| 1                          |
+----------------------------+
2 rows in set (0.00 sec)

mysql> select * from t2;
+-------------------------+
| col_double_undef_signed |
+-------------------------+
|                       0 |
|                      -1 |
+-------------------------+
2 rows in set (0.00 sec)

mysql> SELECT `col_varchar(20)_key_signed` AS f17  FROM t1   HAVING f17 IN (   SELECT   `col_double_undef_signed`    FROM t2 );
+------+
| f17  |
+------+
| -0   |
+------+
1 row in set (0.01 sec)

mysql> SET optimizer_switch='materialization=on';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT `col_varchar(20)_key_signed` AS f17  FROM t1   HAVING f17 IN (   SELECT   `col_double_undef_signed`    FROM t2 );
Empty set (0.00 sec)

Suggested fix:
The root cause is that when using materialization with temptable's hash index, the double values -0.0 and 0.0 are treated as two distinct values. However, in regular join operations, -0.0 and 0.0 are considered identical values, which leads to discrepancies in the results.
It can be fixed like HASH JOIN implement(in function extract_value_for_hash_join):
    case REAL_RESULT: {
      double value = comparand->val_real();
      if (value == 0.0) value = 0.0;  // Ensure that -0.0 hashes as +0.0.
      return append_double_value(value, comparand->null_value, join_key_buffer);
    }
[12 Dec 13:28] Knut Anders Hatlen
Thank you for the bug report.

I could not reproduce the bug exactly as reported on 8.0.43, possibly because the reporter apparently used a storage engine called "Dstore". I used the InnoDB storage engine that comes with MySQL. However, I could reproduce the difference between materialization=on and materialization=off if I had more rows int the tables.

The problem seems to have been fixed as part of Bug#34361437 in MySQL 8.1.0. Closing this bug as a duplicate.

I used these steps to reproduce the difference:

CREATE TABLE `t1` (
  `col_varchar(20)_key_signed` varchar(20) CHARACTER SET utf8mb3 DEFAULT NULL
) CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE `t2` (
  `col_double_undef_signed` double DEFAULT NULL
) CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO t1 VALUES ('-0'), ('1');
INSERT INTO t2 VALUES ('0'), ('-1');

INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;

INSERT INTO t2 SELECT * FROM t2;
INSERT INTO t2 SELECT * FROM t2;
INSERT INTO t2 SELECT * FROM t2;
INSERT INTO t2 SELECT * FROM t2;

ANALYZE TABLE t1, t2;

SET optimizer_switch = 'materialization=on';
SELECT `col_varchar(20)_key_signed` AS f17  FROM t1   HAVING f17 IN (   SELECT   `col_double_undef_signed`    FROM t2 );

SET optimizer_switch = 'materialization=off';
SELECT `col_varchar(20)_key_signed` AS f17  FROM t1   HAVING f17 IN (   SELECT   `col_double_undef_signed`    FROM t2 );