Bug #71940 better per-column character set conversion
Submitted: 5 Mar 2014 14:37
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:5.6.16 OS:Any
Assigned to: CPU Architecture:Any

[5 Mar 2014 14:37] Daniël van Eeden
Description:
Per column character set conversion is needed if the per table character set conversion can't be used (per table: ALTER TABLE..CONVERT TO CHARACTER SET)

To convert one column:
ALTER TABLE table_name MODIFY column_name column_definition

Example:
ALTER TABLE t1 MODIFY c1 VARCHAR(100) CHARACTER SET utf8;

This doesn't take the data type in account they way CONVERT TO on the table does.

Also it the column definition needs to be complete, not just the changed properties.

The information_schema.columns table provides some info:

mysql> show create table x1\G
*************************** 1. row ***************************
       Table: x1
Create Table: CREATE TABLE `x1` (
  `c1` varchar(10) CHARACTER SET utf8 DEFAULT NULL,
  `c2` varchar(10) DEFAULT 'foo' COMMENT 'foofoo'
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> select * from information_schema.columns where table_name='x1'\G
*************************** 1. row ***************************
           TABLE_CATALOG: def
            TABLE_SCHEMA: test
              TABLE_NAME: x1
             COLUMN_NAME: c1
        ORDINAL_POSITION: 1
          COLUMN_DEFAULT: NULL
             IS_NULLABLE: YES
               DATA_TYPE: varchar
CHARACTER_MAXIMUM_LENGTH: 10
  CHARACTER_OCTET_LENGTH: 30
       NUMERIC_PRECISION: NULL
           NUMERIC_SCALE: NULL
      DATETIME_PRECISION: NULL
      CHARACTER_SET_NAME: utf8
          COLLATION_NAME: utf8_general_ci
             COLUMN_TYPE: varchar(10)
              COLUMN_KEY: 
                   EXTRA: 
              PRIVILEGES: select,insert,update,references
          COLUMN_COMMENT: 
*************************** 2. row ***************************
           TABLE_CATALOG: def
            TABLE_SCHEMA: test
              TABLE_NAME: x1
             COLUMN_NAME: c2
        ORDINAL_POSITION: 2
          COLUMN_DEFAULT: foo
             IS_NULLABLE: YES
               DATA_TYPE: varchar
CHARACTER_MAXIMUM_LENGTH: 10
  CHARACTER_OCTET_LENGTH: 10
       NUMERIC_PRECISION: NULL
           NUMERIC_SCALE: NULL
      DATETIME_PRECISION: NULL
      CHARACTER_SET_NAME: latin1
          COLLATION_NAME: latin1_swedish_ci
             COLUMN_TYPE: varchar(10)
              COLUMN_KEY: 
                   EXTRA: 
              PRIVILEGES: select,insert,update,references
          COLUMN_COMMENT: foofoo
2 rows in set (0.00 sec)

It's however quite hard to use this information to create the commands to mass update tables:
SELECT CONCAT(
'ALTER TABLE `', TABLE_SCHEMA, '`.`', TABLE_NAME,
' MODIFY `', COLUMN_NAME,'` ',
COLUMN_TYPE, ' CHARACTER SET utf8 COLLATE utf8_bin',
IF(COLUMN_DEFAULT IS NULL,'', CONCAT(' DEFAULT \'', COLUMN_DEFAULT, '\'')),
IF(COLUMN_COMMENT='', '',  CONCAT(' COMMENT \'', COLUMN_COMMENT, '\'')),
';')
FROM information_schema.columns
WHERE NOT TABLE_SCHEMA IN('mysql', 'information_schema', 'performance_schema');

This query is quite hard to get right: quoting, escaping, etc.

The above query does not deal with nullability, column format and storage settings which can be set per column.
It also doesn't filter on *%char and %text data types.

How to repeat:
See description

Suggested fix:
Make it possible to easily convert the character set and collation on a per column basis.

Something like:
ALTER TABLE t1 CONVERT COLUMN c1 CHARACTER SET utf8;

And then *only* change the character set and collation, nothing else.