Bug #71939 ALTER TABLE..CONVER TO modifies columns already in the correct collation.
Submitted: 5 Mar 2014 12:30 Modified: 10 Jul 2014 15:23
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.6.16 OS:Any
Assigned to: Paul DuBois CPU Architecture:Any

[5 Mar 2014 12:30] Daniël van Eeden
Description:
The ALTER TABLE..CONVERT TO CHARACTER SET.. statement modifies the collations for all columns, not only the ones for which the character set must be changed.

How to repeat:
Statements:

CREATE TABLE `l1` (
  `name_cs` varchar(10) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  `name_ci` varchar(10) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE l1 CONVERT TO CHARACTER SET utf8;

Actual behaviour: both columns are changed to utf8/latin1_general_ci.
Expected bhaviour: Only non-utf8 columns are changed to utf8 (including collation).

Statements:

CREATE TABLE `l1` (
  `name_cs` varchar(10) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  `name_ci` varchar(10) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE l1 CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

Actual behaviour: both columns are changed to utf8/latin1_general_ci.
Expected bhaviour: Same as actual behaviour

Suggested fix:
ALTER TABLE l1 CONVERT TO CHARACTER SET utf8;
This should only impact columns which are not utf8.

ALTER TABLE l1 CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
This should only impact columns which are not utf8 and/or not utf8_general_ci.
[5 Mar 2014 12:42] Daniël van Eeden
This is required to work so I can do this:
1. convert all latin1/latin1_general_cs columns to utf8/utf8_bin (no utf8_general_cs yet, Bug #65830)
ALTER TABLE..MODIFY..CHARACTER SET

2. Convert all other columns to utf8.
ALTER TABLE .. CONVERT TO CHARACTER SET utf8

If I can't use the procedure above I would have to convert the default charset/collation for each table and convert the charset/collation *AND* datatype for each column.
[6 Mar 2014 5:43] MySQL Verification Team
Hello Daniel,

Thank you for the bug report and test case.
Verified as described.

Thanks,
Umesh
[31 Mar 2014 10:14] Georgi Kodinov
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Collation in mysql is related to charset. When you change the charset it does reset the collation too.
http://dev.mysql.com/doc/refman/5.6/en/charset-charsets.html specifies "utf8_general_ci" as a default collation for utf8.
I'm moving this bug to a documentation bug to clarify this dependency better.

I tried the following against a recent 5.6: 

mysql> CREATE TABLE `l1` (
    ->   `name_cs` varchar(10) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,

    ->   `name_ci` varchar(10) CHARACTER SET latin1 COLLATE latin1_general_ci DE
FAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
--------------
CREATE TABLE `l1` (
  `name_cs` varchar(10) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  `name_ci` varchar(10) CHARACTER SET latin1 COLLATE latin1_general_ci DEFAULT N
ULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
--------------

Query OK, 0 rows affected (0.05 sec)

mysql> ALTER TABLE l1 CONVERT TO CHARACTER SET utf8;
--------------
ALTER TABLE l1 CONVERT TO CHARACTER SET utf8
--------------

Query OK, 0 rows affected (0.14 sec)
Records: 0  Duplicates: 0  Warnings: 0

This resulted in : 

mysql> select * from information_schema.columns where table_name = 'l1'\G
--------------
select * from information_schema.columns where table_name = 'l1'
--------------

*************************** 1. row ***************************
           TABLE_CATALOG: def
            TABLE_SCHEMA: test
              TABLE_NAME: l1
             COLUMN_NAME: name_cs
        ORDINAL_POSITION: 1
          COLUMN_DEFAULT: NULL
             IS_NULLABLE: YES
               DATA_TYPE: varchar
CHARACTER_MAXIMUM_LENGTH: 10
  CHARACTER_OCTET_LENGTH: 30
       NUMERIC_PRECISION: NULL
           NUMERIC_SCALE: NULL
      DATETIME_PRECISION: NULL
      CHARACTER_SET_NAME: utf8
          COLLATION_NAME: utf8_general_ci
             COLUMN_TYPE: varchar(10)
              COLUMN_KEY:
                   EXTRA:
              PRIVILEGES: select,insert,update,references
          COLUMN_COMMENT:
*************************** 2. row ***************************
           TABLE_CATALOG: def
            TABLE_SCHEMA: test
              TABLE_NAME: l1
             COLUMN_NAME: name_ci
        ORDINAL_POSITION: 2
          COLUMN_DEFAULT: NULL
             IS_NULLABLE: YES
               DATA_TYPE: varchar
CHARACTER_MAXIMUM_LENGTH: 10
  CHARACTER_OCTET_LENGTH: 30
       NUMERIC_PRECISION: NULL
           NUMERIC_SCALE: NULL
      DATETIME_PRECISION: NULL
      CHARACTER_SET_NAME: utf8
          COLLATION_NAME: utf8_general_ci
             COLUMN_TYPE: varchar(10)
              COLUMN_KEY:
                   EXTRA:
              PRIVILEGES: select,insert,update,references
          COLUMN_COMMENT:
2 rows in set (0.02 sec)
[9 Jul 2014 3:32] Jon Stephens
Corrected status.
[10 Jul 2014 15:16] Paul DuBois
re: "Actual behaviour: both columns are changed to utf8/latin1_general_ci."

How can you have a column with the utf8 character set and a *latin1* collation?
[10 Jul 2014 15:23] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly.

The ALTER TABLE page already indicates that CONVERT TO supports a COLLATE clause and uses the character set default collation if that clause is not present. However, I'll modify this someone to point out explicitly that the statement changes the collation as well as the character set:

The statement also changes the collation of all character columns. If
you specify no COLLATE clause to indicate which collation to use, the
statement uses default collation for the character set. If this
collation is inappropriate for the intended table use (for example,
if it would change from a case-sensitive collation to a
case-insensitive collation), specify a collation explicitly.