Bug #102500 Illegal mix of collation on 8.0.22 and 8.0.23
Submitted: 5 Feb 2021 22:02 Modified: 23 Jul 2021 4:15
Reporter: Leonardo Fernandes Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:8.0.22, 8.0.23 OS:Any
Assigned to: CPU Architecture:Any
Tags: regression

[5 Feb 2021 22:02] Leonardo Fernandes
Description:
The query below works on 8.0.21, but throws illegal mix of collations error on 8.0.22/8.0.23:

8.0.21:
mysql> select * from mysql.user u where u.host=SUBSTRING_INDEX(CURRENT_USER(),'@',-1);
[...]
7 rows in set (0.00 sec)

8.0.22/8.0.23:
mysql>  select * from mysql.user u where u.host=SUBSTRING_INDEX(CURRENT_USER(),'@',-1);
ERROR 1267 (HY000): Illegal mix of collations (ascii_general_ci,IMPLICIT) and (utf8_general_ci,SYSCONST) for operation '='

How to repeat:
Execute the following query on 8.0.22/8.0.23:
select * from mysql.user u where u.host=SUBSTRING_INDEX(CURRENT_USER(),'@',-1);
[7 Feb 2021 5:26] MySQL Verification Team
Hello Leonardo,

Thank you for the report and feedback.

regards,
Umesh
[17 Mar 2021 5:54] Jon Zan
i get the error when calling a stored procedure - this behaviour occured on 8.0.21 as well. So I upgraded to 8.0.23 but still same.
As the procedure call returns an error, this bug should have a higher severity ...
-----------------------------------------------------
Error Code: 1267. 
Illegal mix of collations (utf8mb4_unicode_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='	0.000 sec
[17 Mar 2021 8:21] Roy Lyseng
This problem was introduced in 8.0.22, so you may have hit a different problem.
Was it a regression in 8.0.21? Do you have a test case?
[17 Mar 2021 19:40] Jon Zan
no regression.
On test machine I upgraded MySQL from version 6 to 8.21 first. Then upgrade to .23
The database and tables have utf8mb4 and utf8mb4_unicode_ci as cs and collation
But when I go to alter schema I see utf8mb4 and utf8mb4_0900_ai_ci
I run alter table (as below), refresh and still above is shown
ALTER TABLE `db`.`tbl` 
COLLATE = utf8mb4_unicode_ci ;
I try to change to utf8mb4_general_ci and after refresh still same ..
So when running SP, I get mentioned error
[17 Mar 2021 20:12] Roy Lyseng
FWIW, the new collations, such as utf8mb4_0900_ai_ci, are much faster than the old collations for utf8mb4.

Notice that ALTER TABLE table COLLATE = utf8mb4_unicode_ci does NOT change the columns to the mentioned collation. It is only the "default" collation for the table, which will be used when creating new columns, that will be changed.

To change the collations of columns, you need to use the command ALTER TABLE table ALTER COLUMN.

Can you share the exact error message that you get (it should mention two conflicting collations)?
[18 Mar 2021 19:59] Jon Zan
All databases, tables and columns use same collation which is utf8mb4_unicode_ci.
not sure where it finds the utf8mb4_0900_ai_ci because I verified every field ...
In Workbench (Windows 10) when choosing "Alter Schema", on the drop-down shows utf8mb4_0900_ai_ci instead ... not sure why because in "Table Inspector" it shows utf8mb4_unicode_ci ... this applies for database and tables
. The columns collation is correct

Here is the error when calling a SP:
12:50:17	CALL `dbA`.`sel_Abbr`('CAN')	Error Code: 1267. Illegal mix of collations (utf8mb4_unicode_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='	0.078 sec
[18 Mar 2021 21:50] Roy Lyseng
It is still a bit difficult to give proper advice since I don't know
which equality operation that gives a problem.
You may try to set the connection collation to utf8mb4_unicode_ci.
Check also if the stored procedure is using a different collation,
so that a comparison to a stored procedure variable becomes invalid.
Sorry for not being more precise...
[22 Mar 2021 9:56] Kamil Holubicki
I've investigated the issue the here are my findings:

TL;DR
-----------------

It looks like 8.0.21 behavior was caused by the following side effect. 

Let's consider the following scenario which exposes the problem even on 8.0.21 (username containing utf8 character):

CREATE USER 'Jürgen'@'%' IDENTIFIED BY 'zaq12WSX';
GRANT ALL ON *.* TO 'Jürgen'@'%';

connect as Jürgen user...

mysql> select current_user();
+----------------+
| current_user() |
+----------------+
| Jürgen@%       |
+----------------+

mysql> select * from mysql.user u where u.host=SUBSTRING_INDEX(CURRENT_USER(),'@',1);
ERROR 1267 (HY000): Illegal mix of collations (ascii_general_ci,IMPLICIT) and (utf8_general_ci,SYSCONST) for operation '='

I know that the above query makes no sense, but it shows that the user part of CURRENT_USER() cannot be safely converted to ASCII. If the username contains only ASCII characters, the query does not produce the error.

mysql> select * from mysql.user u where u.host=SUBSTRING_INDEX(CURRENT_USER(),'@',1);
Empty set (0.00 sec) 

However, we can easily adjust to have the same side effect visible in 8.0.22 (read below)

 

Long version:
-------------------

mysql> show create table mysql.user;
CREATE TABLE `user` (
  `Host` char(255) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT '',
  `User` char(32) COLLATE utf8_bin NOT NULL DEFAULT '',
....
Host is ASCII

User is utf8

As the consequence CURRENT_USER() always returns a string containing ASCII only characters in the host part. But in general, the string is utf8.

 

We've got the following Item object tree (only the interesting part - 'where'):

Item_func_eq (item_bool_func2)     (1)
    |-Item_field                                           (2) ascii / DERIVIATION_IMPLICIT
    |-Item_func_substr_index                (3) utf8 / DERIVIATION_SYSCONST  / INNER_TABLE_BIT ((all inherited from current_user)
        |-Item_func_current_user            (4) utf8/ DERIVIATION_SYSCONST/ INNER_TABLE_BIT
        |-Item_string                                    (5)
        |-Item_func_neg                              (6)
            |-Item_Int                                       (7)

 

Why it doesn't work?
---------------------

Item_bool_func2::resolve_type() (1) attempts to determine charset for comparison of 2 strings (this is Item_func_eq object) (call to agg_arg_charsets_for_comparison() which internally sets cmp.cmp_collation which is collation that will be used for comparison). Type is resolved between (2) and (3).
By default the collation of (2) is used (agg_item_collations()), then the function checks if we can convert to collation of (3) (call to DTCollation::aggregate()
DTCollation::aggregate() implements the logic that decides considering actual collation (repertoire and derivation). Collation of (3) is the superset of the collation of (2), but the derivation of (3) is weaker, so (2) wins.
So we do a comparison using ASCII
After deciding which collation will be used for comparison, agg_item_charsets() attempts to create item's converters to convert items to needed collation (call to agg_item_set_converter())
(2) does not need conversion, so a converter is not created
(3) needs converter, so we try to create converter (call to agg_item_set_converter(), Item_func_conv_charset object creation)
(3) cannot be immediately evaluated, because it has INNER_TABLE_BIT flag set (flag originates from (4) and is propagated to (3) during fixing phase), and query execution is not marked as 'started' yet.
As we cannot evaluate immediately, we decide that there is no possible safe conversion utf8 -> ASCII and we raise the error.

Why it worked before?
---------------------

INNER_TABLE_BIT was not set in (4). As the consequence, during converter creation we tried to immediately do the conversion. As the conversion is possible because the host part always contains only ASCII characters, the converter was created, and an error was not raised.
Please note that "it works" only because of the side effect described at the beginning.

How can we fix/adjust:
----------------------

The most straightforward fix seems to be removing get_initial_pseudo_tables() method from Item_func_user class. It was added in 8.0.22 with the commit related to WL#9384. Doing this we will force the software to work exactly as it was before.
We can also allow the creation of a converter in agg_item_set_converter() if (3) collation is MY_REPERTOIRE_UNICODE30 (mix of ASCII and UTF8) not MY_REPERTOIRE_ASCII as it is now. Works, however, I'm not sure about side effects.
We can also adjust the logic of left_is_superset() which decides basing on argument's collation derivation. This code works from ages and I'm not sure about side effects again.
[6 Apr 2021 12:30] Simon Mudd
bug#103223 is related: IFNULL() failed for the same reason and looking at the docs to figure out why and what expected behaviour should be was not at all clear.
[6 Apr 2021 20:38] Roy Lyseng
INNER_TABLE_BIT cannot be removed, since in 8.0 it is the property that determines  a value that is const during an execution, but may be different from one execution to the next. True const values behave differently, as they are converted to the character set of the other argument during resolving, hence if that conversion is successful, the comparison will also succeed.
We try to fix this problem by converting the const-for-execution value dynamically
when needed, instead of issuing the "Illegal mix of collations" error.
This way, functionality should return to how it was before 8.0.22.
[23 Jul 2021 4:15] MySQL Verification Team
Internally this is marked as duplicate of Bug #101891, which is closed with the below change log.

Documented fix as follows in the MySQL 8.0.24 changelog:

    Selecting from the Performace Schema replication_group_members
    table raised error 1267 -Illegal mix of collations ...-. This
    was due to a change in MySQL 8.0.22 in the handling of
    comparisons between a column value and a system variable.

Closed.