Bug #111101 Join by index returns incorrect results
Submitted: 22 May 2023 9:29 Modified: 26 May 2023 1:44
Reporter: John Jove Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.32 OS:Ubuntu (20.04LTS)
Assigned to: CPU Architecture:Any

[22 May 2023 9:29] John Jove
Description:
In case 1, the query outputs an incorrect result.
If I remove the index (case 2) or remove data that do not satisfy the condition (case 3), the query result becomes correct.

How to repeat:
-- case 1 --
CREATE TABLE t0(c0 TEXT);
CREATE TABLE t1(c0 DECIMAL) ;
CREATE INDEX i0 ON t1(c0);
INSERT INTO t0(c0) VALUES('0v'), ('a');
INSERT INTO t1(c0) VALUES (0), (-1);
SELECT * FROM t0, t1 WHERE (t0.c0) = (t1.c0); -- {(a|0)}

-- case 2 --
CREATE TABLE t0(c0 TEXT);
CREATE TABLE t1(c0 DECIMAL) ;
INSERT INTO t0(c0) VALUES('0v'), ('a');
INSERT INTO t1(c0) VALUES (0), (-1);
SELECT * FROM t0, t1 WHERE (t0.c0) = (t1.c0); -- {(a|0), (0v|0)}

-- case 3 --
CREATE TABLE t0(c0 TEXT);
CREATE TABLE t1(c0 DECIMAL) ;
CREATE INDEX i0 ON t1(c0);
INSERT INTO t0(c0) VALUES('0v'), ('a');
INSERT INTO t1(c0) VALUES (0);
SELECT * FROM t0, t1 WHERE (t0.c0) = (t1.c0); -- {(a|0), (0v|0)}
[22 May 2023 12:36] MySQL Verification Team
Hi Mr. Jove,

Thank you for your bug report.

However, this is not a bug.

This is expected behaviour.

Column types of DECIMAL and TEXT are totally incompatible. So, when you do not have indices, MySQL server has to compare values by finding a common denominator.  So, without indices, each value is converted to the nearest possible type and those are compared.

However, when you use indices, then only index is used by searching for the exact value. In that case, the tuple can not be found. In that case, conversion can not be done, since B+ trees are searched by nodes.

Not a bug.
[22 May 2023 13:29] John Jove
Thanks for your detailed instructions.
Does this mean that the SELECT in case 1 uses the index to search the exact value, and only 0 and `a` satisfy the equal condition, but 0 and `0v` does not?
[22 May 2023 15:46] MySQL Verification Team
No, in the first case, none of the values satisfy the conditions, because DECIMAL type can not have values of '0v' or 'a' or similar .......

When there is no index, then server is searching for common denominator, which sometimes requires changing the values of the strings, in your case.
[23 May 2023 1:37] John Jove
But, there is a conflict. In the first case, the SELECT did return the values (a|0). I paste the execution results in the followings.

mysql> SELECT * FROM t0, t1 WHERE (t0.c0) = (t1.c0);
+------+------+
| c0   | c0   |
+------+------+
| a    |    0 |
+------+------+
1 row in set, 2 warnings (0.00 sec)

mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect decimal value: '0v' for column 'c0' at row 1 |
| Warning | 1366 | Incorrect decimal value: 'a' for column 'c0' at row 1  |
+---------+------+--------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM t0, t1 WHERE (t0.c0) = (t1.c0);
+----+-------------+-------+------------+------+---------------+------+---------+------------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref        | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------------+------+----------+------------------------------+
|  1 | SIMPLE      | t0    | NULL       | ALL  | NULL          | NULL | NULL    | NULL       |    2 |   100.00 | Using where; Using temporary |
|  1 | SIMPLE      | t1    | NULL       | ref  | i0            | i0   | 6       | test.t0.c0 |    1 |   100.00 | Using where; Using index     |
+----+-------------+-------+------------+------+---------------+------+---------+------------+------+----------+------------------------------+
2 rows in set, 1 warning (0.01 sec)

mysql> show warnings;
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                 |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select sql_buffer_result `test`.`t0`.`c0` AS `c0`,`test`.`t1`.`c0` AS `c0` from `test`.`t0` join `test`.`t1` where (cast(`test`.`t0`.`c0` as double) = cast(`test`.`t1`.`c0` as double)) |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.32    |
+-----------+
1 row in set (0.00 sec)
[23 May 2023 12:27] MySQL Verification Team
Hi,

Sorry, but what you wrote in your last comment contradicts what you wrote in your opening comment ....
[23 May 2023 13:41] John Jove
Sorry, I may misunderstand the sentence “No, in the first case, none of the values satisfy the conditions, because DECIMAL type can not have values of '0v' or 'a' or similar .......”. Does this mean that the SELECT in the first case should not return values?
[23 May 2023 14:03] MySQL Verification Team
That depends ......

If there are not sufficient rows in the table, so that index can not be used, then there would be no search by index and different results would return.

If there are sufficient rows for the index to be used , then you should get no result.

Number of sufficient rows is calculated by the optimiser and depend on the column type and width.
[23 May 2023 14:27] John Jove
Thanks. I may get this point. The approximate value of 0 and `a` in this case is the same, but their exact values are different. When using an index, it compares their exact values, so no result is returned. And the optimiser is free to match the values either using an index or not use. In this way, the first case is acceptable. That's when compare 0 and `a`, it does not use the index and compares their approximate values, but for 0 and `0v`, it uses the index. Do I understand correctly?
[26 May 2023 1:44] John Jove
By the way, I try to evaluate the same predicate in the followings.

CREATE TABLE t0(c0 TEXT);
CREATE TABLE t1(c0 DECIMAL) ;
CREATE INDEX i0 ON t1(c0);
INSERT INTO t0(c0) VALUES('0v'), ('a');
INSERT INTO t1(c0) VALUES (0), (-1);
SELECT (t0.c0) = (t1.c0) FROM t0, t1; -- {0,0,1,1}
SELECT * FROM t0, t1 WHERE (t0.c0) = (t1.c0); -- {(a|0)}

The SELECT returns two 1s, which is different from the number of results in case 1. Is this case sill acceptable?
[26 May 2023 12:31] MySQL Verification Team
Yes, it is OK as a result, since you have forced a Cartesian product.