Bug #52145 Problem switching charset from latin1 to utf8
Submitted: 17 Mar 2010 16:18 Modified: 18 Mar 2010 14:07
Reporter: Feronia P Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: DDL Severity:S3 (Non-critical)
Version:MySQL 5.1.45 + InnoDB Plugin 1.0.6 OS:Linux
Assigned to: CPU Architecture:Any
Tags: 1.0.6, 5.1.45, charset, InnoDB plugin, utf8
Triage: Needs Triage: D2 (Serious)

[17 Mar 2010 16:18] Feronia P
Description:
When i create a table with charset=latin1 but then try to alter it to utf8 the operation completes but when i then try to insert multibyte unicode character i get the warning:

Incorrect string value: '\xC4\x8Dl\xC3\xA1n...' for column 'Val' at row 1

(from sample case code below)

and the string is not correctly inserted (characters are replaced with defaults).

How to repeat:
drop table if exists EncodingText01;
drop table if exists EncodingText02;

create table EncodingText01 (
  ID int unsigned not null auto_increment primary key,
  Val varchar(250) not null
) engine=innodb charset=latin1;

-- insert latin 1 string
insert into EncodingText01 (val) values ('äöü');

-- change to utf8
alter table EncodingText01 charset=utf8;

-- insert utf8 string
insert into EncodingText01 (val) values ('článku');
-- i get warnings

-- Compare to:
create table EncodingText02 (
  ID int unsigned not null auto_increment primary key,
  Val varchar(250) not null
) engine=innodb charset=utf8;

-- insert same string
insert into EncodingText02 (val) values ('äöü');
insert into EncodingText02 (val) values ('článku');
-- no warnings for me

Suggested fix:
Workaround:
Create the table from the beginning with the right charset OR
dump the whole database and switching the charset in the table create statements and then reimport.
[18 Mar 2010 4:11] Miguel Solorzano
Thank you for the bug report. Outputs with and without plugin:

[miguel@tikal ~]$ dbs/5.1/bin/mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.46-debug-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like "%version%";
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| innodb_version          | 1.0.6               |
| protocol_version        | 10                  |
| version                 | 5.1.46-debug-log    |
| version_comment         | Source distribution |
| version_compile_machine | x86_64              |
| version_compile_os      | unknown-linux-gnu   |
+-------------------------+---------------------+
6 rows in set (0.00 sec)

mysql> drop table if exists EncodingText01;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> drop table if exists EncodingText02;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> 
mysql> create table EncodingText01 (
    ->   ID int unsigned not null auto_increment primary key,
    ->   Val varchar(250) not null
    -> ) engine=innodb charset=latin1;
Query OK, 0 rows affected (0.11 sec)

mysql> 
mysql> -- insert latin 1 string
mysql> insert into EncodingText01 (val) values ('äöü');
Query OK, 1 row affected (0.00 sec)

mysql> alter table EncodingText01 charset=utf8;
Query OK, 1 row affected (0.22 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into EncodingText01 (val) values ('článku');
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+---------------------------------------------------------------------------+
| Level   | Code | Message                                                                   |
+---------+------+---------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xC4\x8Dl\xC3\xA1n...' for column 'Val' at row 1 |
+---------+------+---------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> create table EncodingText02 (
    ->   ID int unsigned not null auto_increment primary key,
    ->   Val varchar(250) not null
    -> ) engine=innodb charset=utf8;
Query OK, 0 rows affected (0.09 sec)

mysql> 
mysql> -- insert same string
mysql> insert into EncodingText02 (val) values ('äöü');
Query OK, 1 row affected (0.00 sec)

mysql> insert into EncodingText02 (val) values ('článku');
Query OK, 1 row affected (0.00 sec)

mysql> 

[miguel@tikal ~]$ dbs/5.1/bin/mysqladmin -uroot shutdown
[miguel@tikal ~]$ dbs/5.1/bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.46-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> drop table if exists EncodingText01;
Query OK, 0 rows affected (0.05 sec)

mysql> drop table if exists EncodingText02;
Query OK, 0 rows affected (0.04 sec)

mysql> 
mysql> create table EncodingText01 (
    ->   ID int unsigned not null auto_increment primary key,
    ->   Val varchar(250) not null
    -> ) engine=innodb charset=latin1;
Query OK, 0 rows affected (0.10 sec)

mysql> 
mysql> -- insert latin 1 string
mysql> insert into EncodingText01 (val) values ('äöü');
Query OK, 1 row affected (0.04 sec)

mysql> 
mysql> -- change to utf8
mysql> alter table EncodingText01 charset=utf8;
Query OK, 1 row affected (0.27 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> 
mysql> -- insert utf8 string
mysql> insert into EncodingText01 (val) values ('článku');
Query OK, 1 row affected (0.04 sec)

mysql> show variables like "%version%";
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| protocol_version        | 10                  |
| version                 | 5.1.46-debug        |
| version_comment         | Source distribution |
| version_compile_machine | x86_64              |
| version_compile_os      | unknown-linux-gnu   |
+-------------------------+---------------------+
5 rows in set (0.00 sec)

mysql>
[18 Mar 2010 13:44] Marko Mäkelä
Sorry, this bug does not appear to be in InnoDB, but in MySQL.

The problem is that the

alter table EncodingText01 charset=utf8;

does not re-code the column Val1 to UTF-8. After the command, I still see this:

select hex(val) from EncodingText01\G
*************************** 1. row ***************************
hex(val): E4F6FC

That is, the column still is in latin1. After the insert, the inserted string has been converted to latin1 (the missing code point converted to question mark):

insert into EncodingText01 (val) values ('článku');

select hex(val) from EncodingText01\G
*************************** 1. row ***************************
hex(val): E4F6FC
*************************** 2. row ***************************
hex(val): 3F6CE16E6B75

Note: when you test this bug, ensure that your mysql client connection uses UTF-8, for example by

show variables like '%character_set%';

and that your terminal emulator uses UTF-8, in the shell:

echo äöü|od -t x1
0000000 c3 a4 c3 b6 c3 bc 0a
0000007
[18 Mar 2010 14:04] Harrison Fisk
This doesn't look like a bug to me.

The issue is with the statement:

alter table EncodingText01 charset=utf8;

This only changed the default character set for the table, not for existing columns, so they are still latin1 from your original definition.  Remember that character sets are a per column setting.

Instead you want to use:

ALTER TABLE EncodingText01 CONVERT TO utf8;

This convert all existing columns and data to be in utf8.

root@test~> create table EncodingText01 (
    ->   ID int unsigned not null auto_increment primary key,
    ->   Val varchar(250) not null
    -> ) engine=innodb charset=latin1;
Query OK, 0 rows affected (0.20 sec)

root@test~> -- insert latin 1 string
root@test~> insert into EncodingText01 (val) values ('äöü');
Query OK, 1 row affected (0.00 sec)

root@test~> -- change to utf8
root@test~> alter table EncodingText01 charset=utf8;
Query OK, 1 row affected (0.18 sec)
Records: 1  Duplicates: 0  Warnings: 0

root@test~> show create table EncodingText01\G
*************************** 1. row ***************************
       Table: EncodingText01
Create Table: CREATE TABLE `EncodingText01` (
  `ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `Val` varchar(250) CHARACTER SET latin1 NOT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf

Notice the column is still latin1 above.
[18 Mar 2010 14:07] Harrison Fisk
Forgot the CHARACTER SET keyword, it should be:

ALTER TABLE EncodingText01 CONVERT TO CHARACTER SET utf8;