Bug #109833 UTF-8 strings are broken when comparing to UNIONs
Submitted: 30 Jan 2023 3:41 Modified: 31 Jan 2023 4:21
Reporter: Tim Lundqvist Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.32 OS:Ubuntu
Assigned to: CPU Architecture:Any

[30 Jan 2023 3:41] Tim Lundqvist
Description:
After updating to 8.0.32 VIEWs containing UNIONs started returning empty result sets when filtered against a string value containing non-ASCII-7 characters. This happens if the filter is applied using a subquery or by having the UNION be part of a VIEW. You receive a warning (1300) "Invalid utf8mb4 character string" when this happens.

Using the same query directly on the table containing the wanted row will result in an expected result set, the error is only encountered when the UNION is involved.

JOINing in a value to match against works as expected, so the only workaround I've been able to find is to change
```
SELECT k FROM (SELECT k FROM a UNION ALL SELECT k FROM b) AS r
WHERE r.k = 'Å'
```
to something similar to this
```
SELECT k FROM (SELECT k FROM a UNION ALL SELECT k FROM b) AS r
LEFT OUTER JOIN a ON a.k = r.k
LEFT OUTER JOIN b ON b.k = r.k
WHERE a.k = 'Å' OR b.k = 'Å'
```

This might be the same underlying bug as #109736.

How to repeat:
```sql

CREATE TABLE `vals0` (
  `k` VARCHAR(8) CHARSET utf8mb4 COLLATE utf8mb4_bin NOT NULL PRIMARY KEY
);
INSERT INTO `vals0` VALUES ('OK');

CREATE TABLE `vals1` (
  `k` VARCHAR(8) CHARSET utf8mb4 COLLATE utf8mb4_bin NOT NULL PRIMARY KEY
);
INSERT INTO `vals1` VALUES ('ERRÖR');

-- this works as expected
SELECT * FROM `vals1` WHERE `k` = 'ERRÖR';

-- this works as expected
SELECT * FROM `view_vals` WHERE `k` = 'OK';

-- this returns empty result set, and a warning about converting D6 to utf8
SELECT * FROM (SELECT `v0`.`k` AS `k` FROM `vals0` AS `v0` UNION ALL SELECT `v1`.`k` AS `k` FROM `vals1` as `v1`) AS `a` WHERE `a`.`k` = 'ERRÖR';

```

```
mysql> SELECT * FROM (SELECT `v0`.`k` AS `k` FROM `vals0` AS `v0` UNION ALL SELECT `v1`.`k` AS `k` FROM `vals1` as `v1`) AS `a` WHERE `a`.`k` = 'ERRÖR';
Empty set, 3 warnings (0.00 sec)

mysql> SHOW WARNINGS;
+---------+------+---------------------------------------------------------+
| Level   | Code | Message                                                 |
+---------+------+---------------------------------------------------------+
| Warning | 1300 | Invalid utf8mb4 character string: 'D652'                |
| Warning | 1366 | Incorrect string value: '\xD6R' for column 'k' at row 1 |
| Warning | 1366 | Incorrect string value: '\xD6R' for column 'k' at row 1 |
+---------+------+---------------------------------------------------------+
3 rows in set (0.00 sec)
```
[30 Jan 2023 5:59] MySQL Verification Team
Hello Tim Lundqvist,

Thank you for the report and feedback.

regards,
Umesh
[30 Jan 2023 8:06] Chaithra Marsur Gopala Reddy
Hi Tim Lundqvist,

Thanks for the bug report. While we work on resolving the issue, the immediate workaround for the problem would be to set the optimizer_switch "derived_condition_pushdown" to off i.e
set optimizer_switch="derived_condition_pushdown=off";
[31 Jan 2023 4:21] MySQL Verification Team
Hello Tim Lundqvist,

Internal discussion confirmed that this is a duplicate of Bug #109699, please see Bug #109699. Thank you.

Sincerely,
Umesh