Bug #104571 Strings with ASCII charset and UCS2 don't match in join statement in MySQL 8
Submitted: 9 Aug 2021 9:37 Modified: 30 Aug 2021 17:10
Reporter: Andrey Zaytcev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:MySQL 8, 8.0.26 OS:Windows
Assigned to: CPU Architecture:x86

[9 Aug 2021 9:37] Andrey Zaytcev
Description:
In MySQL 8 strings with ASCII charset (collate ascii_general_ci) and UCS2 (collate ucs2_general_ci) don't match in join statement. 

How to repeat:
Here is small steps to repeat it:

1. Create two tables with varchar(64) fields and ascii (collate ascii_general_ci) and ucs2 (collate ucs2_general_ci) charsets respectively:
DROP TEMPORARY TABLE IF EXISTS `__ttmp_unicode`;
CREATE TEMPORARY TABLE IF NOT EXISTS `__ttmp_unicode` (
   `wstrName` VARCHAR(256) CHARACTER SET ucs2 COLLATE ucs2_general_ci DEFAULT NULL)
   -- `wstrName` VARCHAR(256) CHARACTER SET ucs2 COLLATE ucs2_unicode_ci DEFAULT NULL)
ENGINE=InnoDB;

DROP TEMPORARY TABLE IF EXISTS `__ttmp_ascii`;
CREATE TEMPORARY TABLE IF NOT EXISTS `__ttmp_ascii` (
    `strName` VARCHAR(64)
) 
ENGINE=InnoDB DEFAULT CHARACTER SET `ascii` COLLATE `ascii_general_ci`;

2. Insert same values in tables:
INSERT INTO `__ttmp_unicode`(wstrName)
VALUES('9337a53f-cfd6-4a68-a1fc-e2d84eba8079');
INSERT INTO `__ttmp_ascii`(strName)
VALUES('9337a53f-cfd6-4a68-a1fc-e2d84eba8079');

3. Check inserted values:
SELECT * FROM `__ttmp_unicode`;
SELECT * FROM `__ttmp_ascii`;

4. Try to join them:
SELECT `wstrName`, `strName`
FROM `__ttmp_unicode`
INNER JOIN `__ttmp_ascii` ON `__ttmp_unicode`.`wstrName` = `__ttmp_ascii`.`strName`;

With MySQL 8 the result is empty table. But with MySQL 5.7 join statement works well and the result is non-empty table:
# wstrName,                             strName
'9337a53f-cfd6-4a68-a1fc-e2d84eba8079', '9337a53f-cfd6-4a68-a1fc-e2d84eba8079'

I also find out that replacing "ucs2_general_ci" collate with "ucs2_unicode_ci" makes join works well like MySQL 5.7. You can see this by uncommenting the line "`wstrName` VARCHAR(256) CHARACTER SET ucs2 COLLATE ucs2_unicode_ci DEFAULT NULL)" in table `__ttmp_unicode` definition
[9 Aug 2021 10:40] Andrey Zaytcev
Change severity to S2
[9 Aug 2021 10:49] MySQL Verification Team
Hello Andrey Zaytcev,

Thank you for the report and test case.

regards,
Umesh
[30 Aug 2021 17:10] Jon Stephens
Documented fix as follows in the MySQL 8.0.27 changelog:

    Otherwise identical strings, using, respectively, the ASCII
    (collation ascii_general_ci) and UCS2 (collation
    ucs2_general_ci) character sets did not match as expected in
    join conditions.

Closed.