Bug #119776 Semantically identical SQLs yield different result sets.
Submitted: 26 Jan 9:31 Modified: 26 Jan 13:51
Reporter: yunhua 王 Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0.41 OS:Any
Assigned to: CPU Architecture:Any

[26 Jan 9:31] yunhua 王
Description:
The following two SQL queries are semantically equivalent, but they yield different result sets:

1)
mysql> SELECT *  FROM `test`  WHERE (`enum_col` IN ('aaa', 'bbb')) AND (`bool_col` >= 0x3A95C9C8D6968E627BD30949E4206E);
Empty set, 2 warnings (0.00 sec)

2)
mysql> select * from test2;
+-------+
| value |
+-------+
| aaa   |
| bbb   |
+-------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM `test`  WHERE enum_col IN ( SELECT value FROM test2 ) AND (`bool_col` >= 0x3A95C9C8D6968E627BD30949E4206E);
+----------+----------+------------+
| enum_col | bool_col | double_col |
+----------+----------+------------+
| aaa      |        0 |      200.3 |
| aaa      |        0 |      300.9 |
| aaa      |        1 |      100.5 |
| aaa      |        1 |      150.7 |
| bbb      |        0 |      350.4 |
| bbb      |        0 |      500.6 |
| bbb      |        1 |      400.2 |
| bbb      |        1 |      450.8 |
+----------+----------+------------+
8 rows in set, 3 warnings (0.00 sec)

When the hex-literal in the filter condition is within a valid length (no truncation occurs), the results of both queries are consistent.

1)
mysql> SELECT * FROM `test`  WHERE enum_col IN ( SELECT value FROM test2 ) AND (`bool_col` >= 0x3A95C9C8D6968);
Empty set (0.00 sec)
2)
mysql> SELECT *  FROM `test`  WHERE (`enum_col` IN ('aaa', 'bbb')) AND (`bool_col` >= 0x3A95C9C8D6968);
Empty set (0.00 sec)

"When no truncation occurs, the optimizer correctly identifies both queries as 'Impossible WHERE' during the optimization phase, returning an empty set as expected."

"However, after truncation occurs due to excessive length, a discrepancy arises: the literal 'IN' version is still pruned as 'Impossible WHERE', but the subquery version fails to trigger this optimization. Instead, the subquery version proceeds to execution, leading to inconsistent results."

How to repeat:
CREATE TABLE `test` (
  `enum_col` enum('aaa','bbb','ccc') CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
  `bool_col` tinyint(1) DEFAULT NULL,
  `double_col` double unsigned DEFAULT NULL,
  KEY `idx` (`enum_col`,`bool_col`,`double_col`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO test (enum_col, bool_col, double_col) VALUES
('aaa', 1, 100.5),
('aaa', 0, 200.3),
('aaa', 1, 150.7),
('aaa', 0, 300.9),
('bbb', 1, 400.2),
('bbb', 0, 500.6),
('bbb', 1, 450.8),
('bbb', 0, 350.4),
('ccc', 1, 600.1),
('ccc', 0, 700.3),
('ccc', 1, 650.5),
('ccc', 0, 550.7);

CREATE TABLE test2 (
    value VARCHAR(10) NOT NULL PRIMARY KEY
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO test2 (value) VALUES
('aaa'),
('bbb');

SELECT * FROM `test`  WHERE enum_col IN ( SELECT value FROM test2 ) AND (`bool_col` >= 0x3A95C9C8D6968E627BD30949E4206E);
SELECT *  FROM `test`  WHERE (`enum_col` IN ('aaa', 'bbb')) AND (`bool_col` >= 0x3A95C9C8D6968E627BD30949E4206E);

SELECT * FROM `test`  WHERE enum_col IN ( SELECT value FROM test2 ) AND (`bool_col` >= 0x3A95C9C8D6968);
SELECT *  FROM `test`  WHERE (`enum_col` IN ('aaa', 'bbb')) AND (`bool_col` >= 0x3A95C9C8D6968);
[26 Jan 13:51] Roy Lyseng
Thank you for the bug report.
Verified as described.