Bug #58686 CREATE TABLE .. CHARSET=<...> without a COLLATE using case-insensitive collation
Submitted: 3 Dec 2010 0:16 Modified: 15 Dec 2010 23:39
Reporter: Greg Kemnitz Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.1.43sp1-enterprise-gpl OS:Any
Assigned to: CPU Architecture:Any
Tags: case insensitive, case sensitive, charset, collate, CREATE TABLE

[3 Dec 2010 0:16] Greg Kemnitz
Description:
If you do 

CREATE TABLE XX (A VARCHAR(20)) ENGINE=INNODB CHARSET=LATIN1;

and the settings for (conn/server/db) collation all are latin1_general_cs, you will still get a case-insensitive collation.  Note that if you create the table without an explicit character set declaration, you _will_ correctly get the latin1 character set with a case sensitive collation.

Below is a log showing the behavior:

mysql>show variables like '%collation%';

+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_general_cs |
| collation_database   | latin1_general_cs |
| collation_server     | latin1_general_cs |
+----------------------+-------------------+
3 rows in set (0.44 sec)

create table tt (a varchar(20)) charset=latin1 engine=innodb;
Query OK, 0 rows affected (0.00 sec)

mysql>insert into tt values ('aa');
Query OK, 1 row affected (0.00 sec)

mysql>insert into tt values ('AA');
Query OK, 1 row affected (0.00 sec)

mysql>select * from tt where a = 'aa';
+------+
| a    |
+------+
| aa   |
| AA   |
+------+
2 rows in set (0.00 sec)

mysql>drop table tt;
Query OK, 0 rows affected (0.01 sec)

mysql>create table tt (a varchar(20)) engine=innodb;
Query OK, 0 rows affected (0.00 sec)

mysql>insert into tt values ('AA');
Query OK, 1 row affected (0.00 sec)

mysql>insert into tt values ('aa');
Query OK, 1 row affected (0.00 sec)

mysql>select * from tt where a = 'aa';
+------+
| a    |
+------+
| aa   |
+------+
1 row in set (0.00 sec)

We were able to repair this and make our database case-sensitive by doing ALTER TABLE ... CHARSET=DEFAULT, but since are "table rebuild" ALTERs, it took several hours on our production database.

How to repeat:
See above log

Suggested fix:
The suggested fix is to have the CREATE TABLE use the default collation as long as it is compatible with the character set, or if not (ie, using UTF8), it should use a case-sensitive collation if that is the default.

The default should NOT be to a case-insensitive collation; that is definitely not expected behavior.
[3 Dec 2010 0:18] Greg Kemnitz
The "suggested fix" applies to CREATE TABLE with CHARSET defined but not COLLATE defined.  Since this is ambiguous, it would also be reasonable to simply produce an error if CHARSET is defined without COLLATE - instead of producing a potentially problematic default.
[15 Dec 2010 23:39] Sveta Smirnova
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

If CHARACTER SET X was specified without COLLATE, then character set X and its default
collation are used. Default collation for latin1 is latin1_swedish_ci. See also http://dev.mysql.com/doc/refman/5.1/en/charset-charsets.html and output of query show character set;