Bug #109699 utf8mb4 identifiers stopped working since 8.0.32 release
Submitted: 19 Jan 2023 10:34 Modified: 2 Feb 2023 17:51
Reporter: Michal Vorisek Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.32 OS:Any
Assigned to: CPU Architecture:Any
Tags: regression, utf8mb4, wrong result

[19 Jan 2023 10:34] Michal Vorisek
Description:
The query below is expected to return 1 row:

['❤' => 'žlutý_😀']

this result was/is returned with MySQL 8.0.31 (and the latest MySQL 5.6)

but as of MySQL 8.0.32, no row it returned.

How to repeat:
https://pastebin.com/qghkzwTu

CREATE TABLE `employee` (
  `id` INT UNSIGNED AUTO_INCREMENT NOT NULL,
  `name` VARCHAR(255) DEFAULT NULL,
  `surname` VARCHAR(255) DEFAULT NULL,
  `retired` TINYINT(1) DEFAULT NULL,
  PRIMARY KEY(`id`)
) DEFAULT CHARACTER SET utf8mb4 COLLATE `utf8mb4_unicode_ci` ENGINE = InnoDB;

insert into `employee` (
  `id`, `name`, `surname`, `retired`
)
values
  (1, 'Oliver', 'Smith', 0);

insert into `employee` (
  `id`, `name`, `surname`, `retired`
)
values
  (2, 'Jack', 'Williams', 1);

insert into `employee` (
  `id`, `name`, `surname`, `retired`
)
values
  (3, 'Harry', 'Taylor', 1);

insert into `employee` (
  `id`, `name`, `surname`, `retired`
)
values
  (4, 'Charlie', 'Lee', 0);

select
  *
from
  (
    select
      'žlutý_😀' `❤`
  ) `🚀`
where
  `❤` = 'žlutý_😀'
group by
  `🚀`.`❤`
having
  (`❤` = 'žlutý_😀');
[19 Jan 2023 15:59] MySQL Verification Team
Hello Michal Vorisek,

Thank you for the report and feedback.

regards,
Umesh
[31 Jan 2023 4:21] MySQL Verification Team
Bug #109833 marked as a duplicate of this one
[2 Feb 2023 17:51] Jon Stephens
Documented fix as follows in the MySQL 8.0.33 changelog:

    When cloning a condition to push down to a derived table,
    characters in strings representing conditions were converted to
    utfmb4 correctly only for values less than 128 (the ASCII
    subset), and code points outside the ASCII subset were converted
    to invalid characters, causing the resulting character strings
    to become invalid. For derived tables without UNIONS, this led
    to problems when a column name from the derived table used
    characters outside the ASCII subset, and was used in the WHERE
    condition. For derived tables with UNIONS it created problems
    when a character outside the ASCII subset was present in a WHERE
    condition.

    We fix these issues by initializing the string used in such
    cases for representing the condition to the connection character
    set.

Closed.
[10 Feb 2023 9:30] MySQL Verification Team
Bug #110005 marked as duplicate of this one
[17 Feb 2023 3:57] MySQL Verification Team
Bug #110102 marked as duplicate of this one
[17 Feb 2023 3:57] MySQL Verification Team
Bug #110104 marked as duplicate of this one
[6 Mar 2023 4:40] MySQL Verification Team
Bug #110228 marked as duplicate of this one
[15 Mar 2023 11:31] MySQL Verification Team
Bug #110375 marked as duplicate of this one
[15 Mar 2023 11:31] MySQL Verification Team
Bug #110376 marked as duplicate of this one
[15 Mar 2023 11:32] MySQL Verification Team
Bug #110376 marked as duplicate of this one
[15 Mar 2023 11:41] Chaithra Marsur Gopala Reddy
Posted by developer:
 
As a workaround, set derived condition pushdown to off like this:

set optimizer_switch="derived_condition_pushdown=off";
[28 Mar 2023 7:02] MySQL Verification Team
Bug #110521 marked as duplicate of this
[14 Apr 2023 12:16] MySQL Verification Team
Bug #110696 marked as duplicate of this one
[18 Apr 2023 9:10] MySQL Verification Team
Bug #110709 marked as duplicate of this one
[9 May 2023 6:39] MySQL Verification Team
Bug #110955 marked as duplicate of this one
[29 May 2023 9:20] MySQL Verification Team
Bug #111176 marked as duplicate of this one
[12 Jun 2023 9:59] MySQL Verification Team
Bug #111363 marked as duplicate of this one