Bug #54278 Illegal mix of collations on downgrading views based on 'utf8mb4' tables
Submitted: 7 Jun 2010 8:58
Reporter: Nidhi Shrotriya Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.5.4-m3, 5.0 OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[7 Jun 2010 8:58] Nidhi Shrotriya
Description:
Test fails while dump downgrade from mysql-trunk(5.5.5-m3) -> 5.5.4-m3, due to view (based on utf8mb4 table) not existing.
On investigating it seems that loading dump after downgrading server fails with the below error-

ERROR 1267 (HY000) at line 139: Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_unicode_ci,IMPLICIT) for operation 'like'

From dump:
------------
/*!50001 SET character_set_client      = utf8 */;
/*!50001 SET character_set_results     = utf8 */;
/*!50001 SET collation_connection      = utf8_general_ci */;
line 139 -> /*!50001 CREATE ALGORITHM=UNDEFINED */
/*!50013 DEFINER=`root`@`localhost` SQL SECURITY DEFINER */
/*!50001 VIEW `v2_initial` AS select `t4_initial`.`subject` AS `subject`,`t4_initial`.`p` AS `p` from `t4_initial` where (convert(`t4_initial`.`subject` using utf8mb4) like (select `t4_initial1`.`subject` AS `subject` from `t4_initial1` where (`t4_initial1`.`subject` = 'aaa'))) */;

How to repeat:
Some portions might not be required to reproduce -
In 5.5.4-m3
-------------------
SET NAMES utf8;
drop table if exists t4_initial;
drop table if exists t4_initial1;
CREATE TABLE t4_initial (
  subject varchar(255) character set utf8 collate utf8_unicode_ci,
  p varchar(15) character set utf8
) DEFAULT CHARSET=latin1;
CREATE TABLE t4_initial1 (
  subject varchar(255) character set utf8 collate utf8_unicode_ci,
  p varchar(15) character set utf8
) DEFAULT CHARSET=latin1;
INSERT INTO t4_initial VALUES ('aaa','bbb');
INSERT INTO t4_initial(subject) VALUES (0x616263F0909080646566); # try inserting 4 byte
INSERT INTO t4_initial1 VALUES ('aaa','bbb');
INSERT INTO t4_initial1(subject) VALUES (0x616263F0909080646566); # try inserting 4 byte

CREATE VIEW v1_initial AS SELECT * FROM t4_initial WHERE subject like 'aaa';
CREATE VIEW v2_initial AS SELECT * FROM t4_initial WHERE subject like (SELECT subject from t4_initial1 WHERE subject='aaa');

Do mysqldump-
bin/mysqldump --user=root --socket=/tmp/mysql.sock --databases test_nidhi > dump_554m3
Stop server.

After Upgrade to 5.5.5-m3.
-------------------------
Start 5.5.5-m3 server.
Load the dump.
-bash-3.2$ bin/mysql --user=root --socket=/tmp/mysql.sock < $PWD/../mysql_554m3/dump_554m3
Run mysql_upgrade
-bash-3.2$ bin/mysql_upgrade --user=root --socket=/tmp/mysql.sock 
Stop, start the server.
Run below command. 

SET NAMES utf8;
# Alter old table, add index
ALTER TABLE t4_initial ADD INDEX (subject);
ALTER TABLE t4_initial1 ADD INDEX (subject);

# Alter old table to new 'utf8mb4'
ALTER TABLE t4_initial
  DEFAULT CHARACTER SET utf8mb4,
  MODIFY subject varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  MODIFY p varchar(255) CHARACTER SET utf8;
ALTER TABLE t4_initial1
  DEFAULT CHARACTER SET utf8,
  MODIFY subject varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  MODIFY p varchar(255) CHARACTER SET utf8;

# Insert 4 byte characters
INSERT INTO t4_initial(subject) VALUES(0xF0909080); # 4byte character
INSERT INTO t4_initial(subject) VALUES(0x616263F0909080646566); # mix of 3 byte & 4 byte
INSERT INTO t4_initial1(subject) VALUES(0xF0909080); # 4byte character
INSERT INTO t4_initial1(subject) VALUES(0x616263F0909080646566); # mix of 3 byte & 4 byte

# Alter converted 'utf8mb4' table back to 'utf8'
ALTER TABLE t4_initial
  DEFAULT CHARACTER SET utf8,
  MODIFY subject varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Do mysqldump:
bash-3.2$ bin/mysqldump --user=root --socket=/tmp/mysql.sock --databases test_nidhi > dump_trunk
Stop server. 

After downgrade in 5.5.4-m3.
-----------------------------
Start server. 
Try Loading dump.
-bash-3.2$ bin/mysql --user=root --socket=/tmp/mysql.sock < $PWD/../mysql-trunk/dump_trunk
ERROR 1267 (HY000) at line 139: Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_unicode_ci,IMPLICIT) for operation 'like'
[21 Jun 2010 9:45] Nidhi Shrotriya
dump_trunk

Attachment: dump_trunk (application/octet-stream, text), 5.97 KiB.

[23 Jun 2010 6:44] Alexander Barkov
The bug is reproducible with 5.5-4.m3 alone, without upgrade/downgrade:

SET NAMES utf8;

DROP TABLE IF EXISTS t4_initial;
DROP TABLE IF EXISTS t4_initial1;
DROP VIEW IF EXISTS v2_initial;

CREATE TABLE t4_initial (
  subject varchar(255) character set utf8 collate utf8_unicode_ci,
  p varchar(15) character set utf8
) DEFAULT CHARSET=latin1; 

CREATE TABLE t4_initial1 (
  subject varchar(255) character set utf8 collate utf8_unicode_ci,
  p varchar(15) character set utf8
) DEFAULT CHARSET=latin1;

INSERT INTO t4_initial VALUES ('aaa','bbb'); 
INSERT INTO t4_initial(subject) VALUES (0x616263F0909080646566); # try inserting 4 byte 
INSERT INTO t4_initial1 VALUES ('aaa','bbb');
INSERT INTO t4_initial1(subject) VALUES (0x616263F0909080646566); # try inserting 4 byte

CREATE VIEW v2_initial AS SELECT * FROM t4_initial
WHERE subject like (SELECT subject from t4_initial1 WHERE subject='aaa');

SELECT 'Original view' AS '';
SHOW CREATE VIEW v2_initial;
SELECT * FROM v2_initial;

ALTER TABLE t4_initial ADD INDEX (subject);
ALTER TABLE t4_initial1 ADD INDEX (subject);

SELECT 'After adding indexes' AS '';
SHOW CREATE VIEW v2_initial;
SELECT * FROM v2_initial;

# Alter old table to new 'utf8mb4'
ALTER TABLE t4_initial
  DEFAULT CHARACTER SET utf8mb4,
  MODIFY subject varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
 # ,MODIFY p varchar(255) CHARACTER SET utf8
  ;
   
ALTER TABLE t4_initial1
  DEFAULT CHARACTER SET utf8,
  MODIFY subject varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
  #, MODIFY p varchar(255) CHARACTER SET utf8
  ;
   
# Insert 4 byte characters
INSERT INTO t4_initial(subject) VALUES(0xF0909080); # 4byte character
INSERT INTO t4_initial(subject) VALUES(0x616263F0909080646566); # mix of 3 byte & 4 byte
INSERT INTO t4_initial1(subject) VALUES(0xF0909080); # 4byte character
INSERT INTO t4_initial1(subject) VALUES(0x616263F0909080646566); # mix of 3 byte & 4 byte

SELECT 'After conversion #1' AS '';
SHOW CREATE VIEW v2_initial;
SELECT * FROM v2_initial;   

# Alter converted 'utf8mb4' table back to 'utf8'
ALTER TABLE t4_initial
  # DEFAULT CHARACTER SET utf8,
  MODIFY subject varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci;

SELECT 'After conversion  #2' AS '';
SHOW CREATE VIEW v2_initial;
SELECT * FROM v2_initial;   

# Try to recreate 
DROP VIEW v2_initial;
CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost`
SQL SECURITY DEFINER
VIEW `v2_initial` AS
SELECT `t4_initial`.`subject` AS `subject`,`t4_initial`.`p` AS `p`
FROM `t4_initial`
WHERE (CONVERT(`t4_initial`.`subject` using utf8mb4)
LIKE (SELECT `t4_initial1`.`subject` FROM `t4_initial1`
where (`t4_initial1`.`subject` = 'aaa')));

The script ends with:

subject	p
aaa	bbb
ERROR 1267 (HY000) at line 73: Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_unicode_ci,IMPLICIT) for operation 'like'

So, SELECT from a view works fine, but an attempt to recreate
a view with the same definition fails.
[23 Jun 2010 7:11] Alexander Barkov
#
# The same big is reproducible with 5.0 and 5.5.4, with latin1 and utf8 pair.
#

SET NAMES utf8;

DROP TABLE IF EXISTS t4_initial;
DROP TABLE IF EXISTS t4_initial1;
DROP VIEW IF EXISTS v2_initial;

CREATE TABLE t4_initial (
  subject varchar(255) character set latin1 collate latin1_danish_ci,
  p varchar(15) character set latin1
) DEFAULT CHARSET=latin1;

CREATE TABLE t4_initial1 (
  subject varchar(255) character set latin1 collate latin1_danish_ci,
  p varchar(15) character set latin1
) DEFAULT CHARSET=latin1;

INSERT INTO t4_initial VALUES ('aaa','bbb'),('ccc','ddd');
INSERT INTO t4_initial1 VALUES ('aaa','bbb'),('ccc','ddd');

CREATE VIEW v2_initial AS SELECT * FROM t4_initial
WHERE subject like (SELECT subject from t4_initial1 WHERE subject='aaa');

SELECT 'Original view' AS '';
SHOW CREATE VIEW v2_initial;
SELECT * FROM v2_initial;

ALTER TABLE t4_initial ADD INDEX (subject);
ALTER TABLE t4_initial1 ADD INDEX (subject);   

SELECT 'After adding indexes' AS '';
SHOW CREATE VIEW v2_initial;
SELECT * FROM v2_initial;

# Alter table to 'utf8'
ALTER TABLE t4_initial 
  MODIFY subject varchar(255) CHARACTER SET utf8 COLLATE utf8_danish_ci;

ALTER TABLE t4_initial1
  MODIFY subject varchar(255) CHARACTER SET utf8 COLLATE utf8_danish_ci;

SELECT 'After conversion #1' AS '';
SHOW CREATE VIEW v2_initial;
SELECT * FROM v2_initial;   

# Alter converted 'utf8' table back to 'latin1'
ALTER TABLE t4_initial
  MODIFY subject varchar(255) CHARACTER SET latin1 COLLATE latin1_danish_ci;

SELECT 'After conversion  #2' AS '';
SHOW CREATE VIEW v2_initial;
SELECT * FROM v2_initial;   

SELECT 'Trying to recreate the view' AS '';
DROP VIEW v2_initial;
CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER
VIEW `v2_initial` AS
select `t4_initial`.`subject` AS `subject`,`t4_initial`.`p` AS `p`
from `t4_initial`
where (convert(`t4_initial`.`subject` using utf8)
      like
      (select `t4_initial1`.`subject` AS `subject` from `t4_initial1`
       where (`t4_initial1`.`subject` = 'aaa')));

exit;

The output is:

subject	p
aaa	bbb
Trying to recreate the view
ERROR 1267 (HY000) at line 59: Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_danish_ci,IMPLICIT) for operation 'like'

Conclusions:

There are some long-standing bug since at least 5.0:

1. Inconsistency between SELECT and dump/restore
- A SELECT query from a view can return some rows without problems
- but an attempt to recreate this view from its definition produces an error
  (for example, when doing dump followed by restore)

  I think that SELECT should fail too.

2. Missing collation definition in case of automatic character set conversion.
This part of the view is not correct in respect of collation:

where (convert(`t4_initial`.`subject` using utf8)
      like
      (select `t4_initial1`.`subject` AS `subject` from `t4_initial1`
       where (`t4_initial1`.`subject` = 'aaa')));

Possible solutions:
a. Do not to print CONVERT() expression at SHOW (or dump) time.
Let the server loading the dump insert the converter itself.
It will use the proper collation.

b. Display the COLLATE clause together with the convert() function:

where (convert(`t4_initial`.`subject` using utf8) COLLATE utf8_danish_ci
      like
      (select `t4_initial1`.`subject` AS `subject` from `t4_initial1`
       where (`t4_initial1`.`subject` = 'aaa')));

But this is still not exactly what happens internally, because it will
set collation derivation to EXPLICIT. We need IMPLICIT here!
It seems we will need a new syntax here, to specify collation derivation:

  COLLATE IMPLICIT utf8_danish_ci