Bug #107787 Incorrect work of query with union in inner select
Submitted: 6 Jul 2022 17:58 Modified: 23 Sep 2022 18:48
Reporter: Evgeny Rogozhnikov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:8.0.29 OS:Windows
Assigned to: CPU Architecture:Any
Tags: regression

[6 Jul 2022 17:58] Evgeny Rogozhnikov
Description:
Problem is reproduced only on MySQL 8.0.29

Checked also in 8.0.26 and 5.7.12-log. and it is worked as expected.

Used select with inner select. In inner select it is used union.
for outer select it is used filtering by text column with COLLATE ucs2_unicode_ci
but it is compared with literal of utf8

if in inner select it is used only one select without union then qury is worked as expected

How to repeat:
I create minimal script where problem is reproduced

CREATE TABLE `a1` (
    `nId` INT AUTO_INCREMENT  PRIMARY KEY,
	`wstrText` VARCHAR(200) UNICODE COLLATE ucs2_unicode_ci
	)ENGINE=InnoDB DEFAULT CHARACTER SET `ascii` COLLATE `ascii_general_ci`
GO

CREATE TABLE `a2` (
    `nId` INT AUTO_INCREMENT  PRIMARY KEY,
	`wstrText` VARCHAR(200) UNICODE COLLATE ucs2_unicode_ci
	)ENGINE=InnoDB DEFAULT CHARACTER SET `ascii` COLLATE `ascii_general_ci`
GO

insert into a1(wstrText) values (_utf8'Patch');
GO

SELECT  `als`.`wstrText`

FROM (

SELECT DISTINCT  `a2`.`wstrText` AS wstrText
FROM `a2`   
WHERE `a2`.`wstrText` LIKE _utf8'Patc%' escape _utf8'\\'   

UNION ALL  SELECT DISTINCT  `a1`.`wstrText` AS wstrText FROM `a1` 
WHERE (`a1`.`wstrText` LIKE _utf8'Patc%' escape _utf8'\\')  

) als  
WHERE `als`.`wstrText` LIKE _utf8'P%' escape _utf8'\\'

last select return empty result

but it is should return 1 record

Suggested fix:
query is worked as expected if modify WHERE clause 

WHERE `als`.`wstrText` LIKE _utf8'P%' collate utf8_general_ci escape _utf8'\\'
[7 Jul 2022 5:39] MySQL Verification Team
Hello Evgeny,

Thank you for the report and test case.

regards,
Umesh
[23 Sep 2022 18:48] Jon Stephens
Documented fix as follows in the MySQL 8.0.32 changelog:

    A condition pushdown into a UNION of queries having LIKE clauses
    did not preserve the correct character set, leading to an
    (erroneous) empty result.

    We solve this problem in two parts:

    1. By refactoring resolution of LIKE expressions, in which
    character set determination and propagation were previously
    performed in two separate blocks of the code that were not
    always consistent with one another.

    2. By adding, in the internal parse_expression() function, a
    character set prefix to any literal character string that is
    cloned.

Closed.