Bug #43307 default collation will be ignored
Submitted: 2 Mar 2009 13:25 Modified: 5 Mar 2009 22:45
Reporter: Susanne Ebrecht Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S1 (Critical)
Version:5.1, 6.0 OS:Any
Assigned to: Paul DuBois CPU Architecture:Any
Tags: regression

[2 Mar 2009 13:25] Susanne Ebrecht
Description:
DROP DATABASE IF EXISTS bug;
CREATE DATABASE bug CHARSET UTF8 COLLATE UTF8_BIN;
SET NAMES UTF8;
USE bug;
CREATE TABLE t (a integer, b char(5) CHARSET UTF8, PRIMARY KEY(a,b))ENGINE=MYISAM;

-- Now try to insert two rows:

INSERT INTO t VALUES (23, 0xC39654C5A13261);
INSERT INTO t VALUES (23, 0xC39674C39F32C385);

MySQL 5.0 works fine, MySQL 5.1 and 6.0 get:
ERROR 1062 (23000): Duplicate entry '23-Ötß2Å' for key 'PRIMARY'

You only have the chance not to get a duplicate key error in MySQL 5.1 and 6.0 when you give the collation to every single row:

CREATE TABLE t (a integer, b char(5) CHARSET UTF8 COLLATE UTF8_BIN, PRIMARY KEY(a,b))ENGINE=MYISAM;

That the default collation of the database/table is utf8_bin is not enough here.

I tested with 5.0.79, 5.1.33, 6.0.11

How to repeat:
See above

Suggested fix:
Let columns accept the default table collation.
[3 Mar 2009 5:20] Alexander Barkov
There are no bugs here.

Default character set and collation are copied from a higher
level to a lower level ONLY if the lower level does not have
neither CHARACTER SET nor COLLATE clause.

As soon as the lower level specifies CHARACTER SET with no COLLATE clause,
then the upper level character set and collation are completely ignored, and
the DEFAULT collation for the given character set is used (which is
utf8_general_ci in the given example).

This behavior was implemented according to the standard,
and is documented here:

http://dev.mysql.com/doc/refman/5.1/en/charset-table.html

I can't see any bugs here.
[3 Mar 2009 6:48] Susanne Ebrecht
This is a duplicate of bug #21085.

Anyway our documentation is not very clear here. It is not really understandable in documentation what happens in given examples below.

Also it is very confusing that when you have a default table collation it won't be taken.

So I will set this as documentation bug.

My documentation suggestion:

Please add some examples:

CREATE TABLE t(v VARCHAR(100)) CHARACTER SET utf8 COLLATE utf8_bin;
=> Collation utf8_general_ci is taken for column v

CREATE TABLE t(v VARCHAR(100) CHARACTER SET utf8) CHARACTER SET utf8 COLLATE utf8_bin;
=> Collation utf8_general_ci is taken for column v

CREATE TABLE t(v VARCHAR(00) CHARACTER SET utf8) COLLATE utf8_bin;
=> Collation utf8_general_ci is taken for column v

CREATE TABLE t(v varchar(100) CHARACTER SET utf8 COLLATE utf8_bin);
=> Only here collation utf8_bin is taken

Additional please provide a list of default collations for every single charset linked from charset_table.html, charset_column.html.
[5 Mar 2009 22:45] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

Added examples to column-charset section. Also added note that you can use SHOW COLLATION to determine the default collation for all character sets.