Bug #120384 Unexpected Error for Encoding \xD7\xFA
Submitted: 30 Apr 13:53 Modified: 30 Apr 14:40
Reporter: JINSHENG BA Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:8.4.9 OS:Any
Assigned to: CPU Architecture:Any

[30 Apr 13:53] JINSHENG BA
Description:
Please see this test case.

```sql
CREATE TABLE gbk (id int, info varchar(10)) DEFAULT CHARSET = gbk;
CREATE TABLE utf (id int, info varchar(10)) DEFAULT CHARSET = utf8mb4;

INSERT INTO gbk VALUES (1, '?'), (2, 0xD7FA);
INSERT INTO utf VALUES (3, '?');

SELECT temp.id, temp.info
FROM (
    SELECT g.id, u.info  FROM gbk AS g JOIN utf AS u
      ON g.info = u.info ) AS temp
WHERE temp.id < 2;
```

Note that `0xD7FA` is invalid in GBK.

The SQL above outputs:

```
+------+------+
| id   | info |
+------+------+
|    1 | ?    |
+------+------+
```

If we disable the following two optimization options:

```sql
SET optimizer_switch='derived_condition_pushdown=off,derived_merge=off';
```

Executing the same SELECT would return this error:
```
ERROR 3854 (HY000): Cannot convert string '\xD7\xFA' from gbk to utf8mb4
```

Query optimizations are expected to affect the performance only, not the result. When invalid bytes for legacy encodings are involved, we get inconsistent results for the same query under different optimizations.

We also noticed that `SELECT CONVERT(0xD7FA USING utf8mb4)` would return NULL with an encoding-related warning.

How to repeat:
$ sudo docker run -it -p 13306:3306 -e MYSQL_ROOT_PASSWORD=123456 mysql:8.4.9
$ mysql --host 127.0.0.1 -u root --password=123456 --port 13306

Then create a database and type the above test case.
[30 Apr 14:40] Roy Lyseng
Thank you for the bug report.

However, this is not a bug.

The optimizer is free to eliminate expressions that may cause errors for one query plan, but not for another. Thus, a query may terminate with an error for a non-optimal plan, but execute successfully for a more optimal plan, when the erroneous expression is not reached.