Bug #74920 New tables ignore the default collate when a default charset is set
Submitted: 18 Nov 2014 19:03 Modified: 18 Nov 2014 21:05
Reporter: Emiliano Perez Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.6.19-log OS:Linux (Amazon RDS)
Assigned to: CPU Architecture:Any
Tags: charset, collate, utf8mb4_general_ci, utf8mb4_unicode_ci

[18 Nov 2014 19:03] Emiliano Perez
Description:
I have the variables for my MySQL server set to use utf8mb4_unicode_ci anywhere possible...

MySQL [test]> show variables like '%collat%';
+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_unicode_ci |
| collation_database   | utf8mb4_unicode_ci |
| collation_server     | utf8mb4_unicode_ci |
+----------------------+--------------------+
3 rows in set (0.00 sec)

-------------------------------------------------------------------------------

I have a database created with the default charset and collate for utf8mb4_unicode_ci...

MySQL [test]> CREATE DATABASE emi_testing DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_ci;

-------------------------------------------------------------------------------

Now, I've created a table there, which I was assuming that it would be in the default collation which is utf8mb4_unicode_ci...

MySQL [emi_testing]> CREATE TABLE emi_testing.fb_test (post_id varchar(64) not null) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

-------------------------------------------------------------------------------

As it turns out, it's not...

MySQL [emi_testing]> show table status like 'fb_test' \G
*************************** 1. row ***************************
           Name: fb_test
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 0
 Avg_row_length: 0
    Data_length: 16384
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2014-11-18 18:49:23
    Update_time: NULL
     Check_time: NULL
      Collation: utf8mb4_general_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

Why was it created as *utf8mb4_general_ci* when I've clearly set the default collate as *utf8mb4_unicode_ci* ?

-------------------------------------------------------------------------------

This causes my stored procedures (which are created with the default collation) to collapse by causing an illegal correlation of types:

ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_unicode_ci,IMPLICIT) for operation '='

It's possible to "work-around it" by adding a COLLATE on all the SELECTs on my SP, but this causes the query not to use the varchar index, thus making the procedure unusably slow.

-------------------------------------------------------------------------------

The only way I got to make the table use the same default collate as the database was to force it on the table creation like so...

MySQL [emi_testing]> CREATE TABLE emi_testing.fb_test (post_id varchar(64) not null) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 DEFAULT COLLATE=utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.02 sec)

MySQL [emi_testing]> show table status like 'fb_test' \G
*************************** 1. row ***************************
           Name: fb_test
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 0
 Avg_row_length: 0
    Data_length: 16384
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2014-11-18 18:58:09
    Update_time: NULL
     Check_time: NULL
      Collation: utf8mb4_unicode_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

And now the table is on the correct collation (utf8mb4_unicode_ci). 

-------------------------------------------------------------------------------

It also works if I don't set a DEFAULT CHARSET on the table...

MySQL [emi_testing]> CREATE TABLE emi_testing.fb_test (post_id varchar(64) not null) ENGINE=InnoDB;
Query OK, 0 rows affected (0.03 sec)

MySQL [emi_testing]> show table status like 'fb_test' \G
*************************** 1. row ***************************
           Name: fb_test
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 0
 Avg_row_length: 0
    Data_length: 16384
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2014-11-18 18:59:52
    Update_time: NULL
     Check_time: NULL
      Collation: utf8mb4_unicode_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

How to repeat:
(just read the description to repeat the issue)

Suggested fix:
Make the default collate be the one set on the database when it's not set on the table creation (instead of just guessing the right one).
[18 Nov 2014 21:04] Emiliano Perez
Sorry I've checked with a colleague and it turns out it's "a feature and not a bug", since by setting the default charset it will check the default collate of that charset and not the one setup on the database.

It would be more intuitive though that if I set the same charset on the table as the default charset on the database, the default collate on the database gets prioritized over the one belonging to the charset.

Although this could break backwards compatibility so... I'll just save you some time and close the bug.
[18 Nov 2014 21:05] Emiliano Perez
I explain this change on the first comment here. 
It seems that "it's a feature and not a bug".