Bug #92741 Wrong collation chosen for stored procedures (Illegal mix of collations)
Submitted: 10 Oct 2018 22:12 Modified: 25 Oct 2018 16:51
Reporter: Frederic Steinfels Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.12 OS:Any
Assigned to: CPU Architecture:Any
Tags: collation

[10 Oct 2018 22:12] Frederic Steinfels
Description:
This error did not occur under 5.7.23
This error does occur under 8.0.12
The error message is: 

ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_unicode_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='

The database and everything else is set to utf8mb4_unicode_ci.

The stored function returns LONGTEXT with utf8mb4.

When changing RETURN(dststr) to RETURN(dststr COLLATE utf8mb4_unicode_ci) I will get this error:

ERROR 1253 (42000): COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'utf8'

When changing RETURN(dststr) to RETURN(dststr COLLATE utf8_unicode_ci) I will get no error. However as utf8mb4 is the character set chosen for deststr, the error should occur in this case and not in the previous one.

How to repeat:
Install the Function from the sql file and run it:

select NODOUBLEFIELDS('a,b,c,d,b,c',',');

result should be "a,b,c,d" and no error.

Do some experimentation with the return value as described above.
[10 Oct 2018 22:17] Frederic Steinfels
DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `NODOUBLEFIELDS`(`srcstr` LONGTEXT CHARSET utf8mb4, `sep` CHAR(1) CHARSET utf8mb4) RETURNS longtext CHARSET utf8mb4
    NO SQL
    DETERMINISTIC
BEGIN
    DECLARE sfs SMALLINT;
    DECLARE sfc SMALLINT;
    DECLARE dststr LONGTEXT;
    DECLARE srcstri LONGTEXT;
    DECLARE dststri LONGTEXT;
    DECLARE spossrc INT;
    DECLARE sposdst INT;
    DECLARE ipossrc INT;
    DECLARE iposdst INT;
    DECLARE occ INT;

	SET spossrc=1;
    SET sposdst=1;
    SET ipossrc=0;
    SET iposdst=0;
    SET occ=0;
	SET dststr="";
    
	WHILE ipossrc<=CHAR_LENGTH(srcstr) DO
    	SET sfs=0;
		WHILE ipossrc<=CHAR_LENGTH(srcstr) AND sfs=0 DO
			SET ipossrc=ipossrc+1;
			IF SUBSTR(srcstr,ipossrc,1)=sep THEN
				SET sfs=1;
			END IF;
		END WHILE;
	    SET srcstri=SUBSTR(srcstr,spossrc,ipossrc-spossrc);   
        SET spossrc=ipossrc+1;
        SET iposdst=0;
        SET sposdst=1;
        
        SET occ=0;
		WHILE iposdst<=CHAR_LENGTH(dststr) DO
        	SET sfc=0;
			WHILE iposdst<=CHAR_LENGTH(dststr) AND sfc=0 DO
				SET iposdst=iposdst+1;
				IF SUBSTR(dststr,iposdst,1)=sep THEN
					SET sfc=1;
				END IF;
			END WHILE;
	    	SET dststri=SUBSTR(dststr,sposdst,iposdst-sposdst);
			SET sposdst=iposdst+1;
            IF (dststri=srcstri) THEN 
            	SET occ=occ+1;
            END IF;
        END WHILE;
        IF (occ=0) THEN
        	IF (dststr='') THEN
            	SET dststr=srcstri;
            ELSE
            	SET dststr=CONCAT(dststr,',',srcstri);
            END IF;
        END IF;
	END WHILE;
	RETURN(dststr COLLATE utf8_unicode_ci );
END$$
DELIMITER ;
[15 Oct 2018 13:30] Sinisa Milivojevic
Hi,

Thank you for your bug report.

You observed a difference in behaviour between 5.7 and 8.0 when it comes to the character sets and collations.  This is fully expected behaviour, since a default character set and their collations have been changed from utf-8 to utfmb4. 

You are asking us to debug your stored procedure, which we can not do, since this is not a forum for free support. However, we shall provide you with couple of advices.

First of all, you should declare all of your parameters and all LONGTEXT local variables to have a collation that is the same as the default one. Second, you can list all the default character sets and associated collations and correct them be identical.

Not a bug.
[15 Oct 2018 17:06] Frederic Steinfels
The database charset was indeed on utf8 instead of utf8mb4. I have changed this. However this just changed the error message slightly, the problem remains.

There is no way to set a collation for an input value.

I checked the config variables:

show variables like "%collat%";
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_unicode_ci |
| collation_database            | utf8mb4_unicode_ci |
| collation_server              | utf8mb4_unicode_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+

show variables like "%character%";
+--------------------------+--------------------------------+
| Variable_name            | Value                          |
+--------------------------+--------------------------------+
| character_set_client     | utf8mb4                        |
| character_set_connection | utf8mb4                        |
| character_set_database   | utf8mb4                        |
| character_set_filesystem | binary                         |
| character_set_results    | utf8mb4                        |
| character_set_server     | utf8mb4                        |
| character_set_system     | utf8                           |
| character_sets_dir       | /usr/share/mysql-8.0/charsets/ |
+--------------------------+--------------------------------+

I tried to change default_collation_for_utf8mb4  to utf8mb4_unicode_ci. 

set @@default_collation_for_utf8mb4=utf8mb4_unicode_ci;
ERROR 3721 (HY000): Invalid default collation utf8mb4_unicode_ci: utf8mb4_0900_ai_ci or utf8mb4_general_ci expected

Ok, does not work.

The problem seems to be this: An input value in a stored procedure (at least for LONGTEXT) seems to assume the collation stored in default_collation_for_utf8mb4.

A variable in stored procedure delcared like:
DECLARE dststr LONGTEXT CHARSET utf8mb4
Seems to assume the utf8mb4_0900_ai_ci COLLATION. If I use

DECLARE dststr LONGTEXT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci

I will get

ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_unicode_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='

So it seems impossible to use any COLLATION besides of utf8mb4_0900_ai_ci in a stored procedure. Is this really expected behaviour? I mean why can I use DECLARE <sth> COLLATE <sth> when it is impossible to use any collation besides the default one? I consider this a bug.
[16 Oct 2018 12:09] Sinisa Milivojevic
Sorry, but this is not a bug.

First of all, you have to set a collation for every parameter and local variables in all stored routines. The other option is to use introducers in the expressions within your stored routine. This is all described in our manual.

Last, but not least, you should look at the I_S and see how are character sets and collations defined for each different system module. Then, you can change it in your configuration.
[16 Oct 2018 17:38] Frederic Steinfels
Sorry, I have to write again because I still think it is a bug.
I checked the information schema. I checked all tables, rows, columns and the databse. Everything is ok. Everything is utf8_unicode_ci.
However for some unknown reason 
Furthermore you are saying I should be able to use introducers. I do not think this will work. I can also not declare the collation of a variable, only the character set. All of these instructions will fail with syntax error:

    DECLARE dststri LONGTEXT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;
    DECLARE dststri LONGTEXT CHARSET utf8mb4 COLLATION utf8mb4_unicode_ci;
    DECLARE dststri LONGTEXT COLLATE utf8mb4_unicode_ci;
    DECLARE dststri LONGTEXT COLLATION utf8mb4_unicode_ci;
[17 Oct 2018 11:50] Sinisa Milivojevic
OK ......

Send us the output of all the tables involved, including collations for all CHAR derived column domains. Send us the collations for each of your databases and for each of the system modules.

We can not determine whether it is a bug, unless we have all the info.
[17 Oct 2018 12:00] Sinisa Milivojevic
One more detail.

You wrote that introducers do not work, but you have not provided the example of how you tried the introducers and what errors were returned. You just provided the definition of collations for the local variables.
[25 Oct 2018 16:51] Frederic Steinfels
You might be right, it might be my problem. However I am not so sure but I was unable to recreate the error so you can consider this closed.
[26 Oct 2018 11:42] Sinisa Milivojevic
Thank you for the feedback.

Closed ......