Bug #54278 | Illegal mix of collations on downgrading views based on 'utf8mb4' tables | ||
---|---|---|---|
Submitted: | 7 Jun 2010 8:58 | ||
Reporter: | Nidhi Shrotriya | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 5.5.4-m3, 5.0 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[7 Jun 2010 8:58]
Nidhi Shrotriya
[21 Jun 2010 9:45]
Nidhi Shrotriya
dump_trunk
Attachment: dump_trunk (application/octet-stream, text), 5.97 KiB.
[23 Jun 2010 6:44]
Alexander Barkov
The bug is reproducible with 5.5-4.m3 alone, without upgrade/downgrade: SET NAMES utf8; DROP TABLE IF EXISTS t4_initial; DROP TABLE IF EXISTS t4_initial1; DROP VIEW IF EXISTS v2_initial; CREATE TABLE t4_initial ( subject varchar(255) character set utf8 collate utf8_unicode_ci, p varchar(15) character set utf8 ) DEFAULT CHARSET=latin1; CREATE TABLE t4_initial1 ( subject varchar(255) character set utf8 collate utf8_unicode_ci, p varchar(15) character set utf8 ) DEFAULT CHARSET=latin1; INSERT INTO t4_initial VALUES ('aaa','bbb'); INSERT INTO t4_initial(subject) VALUES (0x616263F0909080646566); # try inserting 4 byte INSERT INTO t4_initial1 VALUES ('aaa','bbb'); INSERT INTO t4_initial1(subject) VALUES (0x616263F0909080646566); # try inserting 4 byte CREATE VIEW v2_initial AS SELECT * FROM t4_initial WHERE subject like (SELECT subject from t4_initial1 WHERE subject='aaa'); SELECT 'Original view' AS ''; SHOW CREATE VIEW v2_initial; SELECT * FROM v2_initial; ALTER TABLE t4_initial ADD INDEX (subject); ALTER TABLE t4_initial1 ADD INDEX (subject); SELECT 'After adding indexes' AS ''; SHOW CREATE VIEW v2_initial; SELECT * FROM v2_initial; # Alter old table to new 'utf8mb4' ALTER TABLE t4_initial DEFAULT CHARACTER SET utf8mb4, MODIFY subject varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci # ,MODIFY p varchar(255) CHARACTER SET utf8 ; ALTER TABLE t4_initial1 DEFAULT CHARACTER SET utf8, MODIFY subject varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci #, MODIFY p varchar(255) CHARACTER SET utf8 ; # Insert 4 byte characters INSERT INTO t4_initial(subject) VALUES(0xF0909080); # 4byte character INSERT INTO t4_initial(subject) VALUES(0x616263F0909080646566); # mix of 3 byte & 4 byte INSERT INTO t4_initial1(subject) VALUES(0xF0909080); # 4byte character INSERT INTO t4_initial1(subject) VALUES(0x616263F0909080646566); # mix of 3 byte & 4 byte SELECT 'After conversion #1' AS ''; SHOW CREATE VIEW v2_initial; SELECT * FROM v2_initial; # Alter converted 'utf8mb4' table back to 'utf8' ALTER TABLE t4_initial # DEFAULT CHARACTER SET utf8, MODIFY subject varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci; SELECT 'After conversion #2' AS ''; SHOW CREATE VIEW v2_initial; SELECT * FROM v2_initial; # Try to recreate DROP VIEW v2_initial; CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v2_initial` AS SELECT `t4_initial`.`subject` AS `subject`,`t4_initial`.`p` AS `p` FROM `t4_initial` WHERE (CONVERT(`t4_initial`.`subject` using utf8mb4) LIKE (SELECT `t4_initial1`.`subject` FROM `t4_initial1` where (`t4_initial1`.`subject` = 'aaa'))); The script ends with: subject p aaa bbb ERROR 1267 (HY000) at line 73: Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_unicode_ci,IMPLICIT) for operation 'like' So, SELECT from a view works fine, but an attempt to recreate a view with the same definition fails.
[23 Jun 2010 7:11]
Alexander Barkov
# # The same big is reproducible with 5.0 and 5.5.4, with latin1 and utf8 pair. # SET NAMES utf8; DROP TABLE IF EXISTS t4_initial; DROP TABLE IF EXISTS t4_initial1; DROP VIEW IF EXISTS v2_initial; CREATE TABLE t4_initial ( subject varchar(255) character set latin1 collate latin1_danish_ci, p varchar(15) character set latin1 ) DEFAULT CHARSET=latin1; CREATE TABLE t4_initial1 ( subject varchar(255) character set latin1 collate latin1_danish_ci, p varchar(15) character set latin1 ) DEFAULT CHARSET=latin1; INSERT INTO t4_initial VALUES ('aaa','bbb'),('ccc','ddd'); INSERT INTO t4_initial1 VALUES ('aaa','bbb'),('ccc','ddd'); CREATE VIEW v2_initial AS SELECT * FROM t4_initial WHERE subject like (SELECT subject from t4_initial1 WHERE subject='aaa'); SELECT 'Original view' AS ''; SHOW CREATE VIEW v2_initial; SELECT * FROM v2_initial; ALTER TABLE t4_initial ADD INDEX (subject); ALTER TABLE t4_initial1 ADD INDEX (subject); SELECT 'After adding indexes' AS ''; SHOW CREATE VIEW v2_initial; SELECT * FROM v2_initial; # Alter table to 'utf8' ALTER TABLE t4_initial MODIFY subject varchar(255) CHARACTER SET utf8 COLLATE utf8_danish_ci; ALTER TABLE t4_initial1 MODIFY subject varchar(255) CHARACTER SET utf8 COLLATE utf8_danish_ci; SELECT 'After conversion #1' AS ''; SHOW CREATE VIEW v2_initial; SELECT * FROM v2_initial; # Alter converted 'utf8' table back to 'latin1' ALTER TABLE t4_initial MODIFY subject varchar(255) CHARACTER SET latin1 COLLATE latin1_danish_ci; SELECT 'After conversion #2' AS ''; SHOW CREATE VIEW v2_initial; SELECT * FROM v2_initial; SELECT 'Trying to recreate the view' AS ''; DROP VIEW v2_initial; CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v2_initial` AS select `t4_initial`.`subject` AS `subject`,`t4_initial`.`p` AS `p` from `t4_initial` where (convert(`t4_initial`.`subject` using utf8) like (select `t4_initial1`.`subject` AS `subject` from `t4_initial1` where (`t4_initial1`.`subject` = 'aaa'))); exit; The output is: subject p aaa bbb Trying to recreate the view ERROR 1267 (HY000) at line 59: Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_danish_ci,IMPLICIT) for operation 'like' Conclusions: There are some long-standing bug since at least 5.0: 1. Inconsistency between SELECT and dump/restore - A SELECT query from a view can return some rows without problems - but an attempt to recreate this view from its definition produces an error (for example, when doing dump followed by restore) I think that SELECT should fail too. 2. Missing collation definition in case of automatic character set conversion. This part of the view is not correct in respect of collation: where (convert(`t4_initial`.`subject` using utf8) like (select `t4_initial1`.`subject` AS `subject` from `t4_initial1` where (`t4_initial1`.`subject` = 'aaa'))); Possible solutions: a. Do not to print CONVERT() expression at SHOW (or dump) time. Let the server loading the dump insert the converter itself. It will use the proper collation. b. Display the COLLATE clause together with the convert() function: where (convert(`t4_initial`.`subject` using utf8) COLLATE utf8_danish_ci like (select `t4_initial1`.`subject` AS `subject` from `t4_initial1` where (`t4_initial1`.`subject` = 'aaa'))); But this is still not exactly what happens internally, because it will set collation derivation to EXPLICIT. We need IMPLICIT here! It seems we will need a new syntax here, to specify collation derivation: COLLATE IMPLICIT utf8_danish_ci