| Bug #52145 | Problem switching charset from latin1 to utf8 | ||
|---|---|---|---|
| Submitted: | 17 Mar 2010 16:18 | Modified: | 18 Mar 2010 14:07 |
| Reporter: | Feronia P | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: DDL | Severity: | S3 (Non-critical) |
| Version: | MySQL 5.1.45 + InnoDB Plugin 1.0.6 | OS: | Linux |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | 1.0.6, 5.1.45, charset, InnoDB plugin, utf8 | ||
[18 Mar 2010 4:11]
MySQL Verification Team
Thank you for the bug report. Outputs with and without plugin:
[miguel@tikal ~]$ dbs/5.1/bin/mysql -uroot test
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.46-debug-log Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show variables like "%version%";
+-------------------------+---------------------+
| Variable_name | Value |
+-------------------------+---------------------+
| innodb_version | 1.0.6 |
| protocol_version | 10 |
| version | 5.1.46-debug-log |
| version_comment | Source distribution |
| version_compile_machine | x86_64 |
| version_compile_os | unknown-linux-gnu |
+-------------------------+---------------------+
6 rows in set (0.00 sec)
mysql> drop table if exists EncodingText01;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> drop table if exists EncodingText02;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql>
mysql> create table EncodingText01 (
-> ID int unsigned not null auto_increment primary key,
-> Val varchar(250) not null
-> ) engine=innodb charset=latin1;
Query OK, 0 rows affected (0.11 sec)
mysql>
mysql> -- insert latin 1 string
mysql> insert into EncodingText01 (val) values ('äöü');
Query OK, 1 row affected (0.00 sec)
mysql> alter table EncodingText01 charset=utf8;
Query OK, 1 row affected (0.22 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into EncodingText01 (val) values ('článku');
Query OK, 1 row affected, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+---------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xC4\x8Dl\xC3\xA1n...' for column 'Val' at row 1 |
+---------+------+---------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> create table EncodingText02 (
-> ID int unsigned not null auto_increment primary key,
-> Val varchar(250) not null
-> ) engine=innodb charset=utf8;
Query OK, 0 rows affected (0.09 sec)
mysql>
mysql> -- insert same string
mysql> insert into EncodingText02 (val) values ('äöü');
Query OK, 1 row affected (0.00 sec)
mysql> insert into EncodingText02 (val) values ('článku');
Query OK, 1 row affected (0.00 sec)
mysql>
[miguel@tikal ~]$ dbs/5.1/bin/mysqladmin -uroot shutdown
[miguel@tikal ~]$ dbs/5.1/bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.46-debug Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> drop table if exists EncodingText01;
Query OK, 0 rows affected (0.05 sec)
mysql> drop table if exists EncodingText02;
Query OK, 0 rows affected (0.04 sec)
mysql>
mysql> create table EncodingText01 (
-> ID int unsigned not null auto_increment primary key,
-> Val varchar(250) not null
-> ) engine=innodb charset=latin1;
Query OK, 0 rows affected (0.10 sec)
mysql>
mysql> -- insert latin 1 string
mysql> insert into EncodingText01 (val) values ('äöü');
Query OK, 1 row affected (0.04 sec)
mysql>
mysql> -- change to utf8
mysql> alter table EncodingText01 charset=utf8;
Query OK, 1 row affected (0.27 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql>
mysql> -- insert utf8 string
mysql> insert into EncodingText01 (val) values ('článku');
Query OK, 1 row affected (0.04 sec)
mysql> show variables like "%version%";
+-------------------------+---------------------+
| Variable_name | Value |
+-------------------------+---------------------+
| protocol_version | 10 |
| version | 5.1.46-debug |
| version_comment | Source distribution |
| version_compile_machine | x86_64 |
| version_compile_os | unknown-linux-gnu |
+-------------------------+---------------------+
5 rows in set (0.00 sec)
mysql>
[18 Mar 2010 13:44]
Marko Mäkelä
Sorry, this bug does not appear to be in InnoDB, but in MySQL.
The problem is that the
alter table EncodingText01 charset=utf8;
does not re-code the column Val1 to UTF-8. After the command, I still see this:
select hex(val) from EncodingText01\G
*************************** 1. row ***************************
hex(val): E4F6FC
That is, the column still is in latin1. After the insert, the inserted string has been converted to latin1 (the missing code point converted to question mark):
insert into EncodingText01 (val) values ('článku');
select hex(val) from EncodingText01\G
*************************** 1. row ***************************
hex(val): E4F6FC
*************************** 2. row ***************************
hex(val): 3F6CE16E6B75
Note: when you test this bug, ensure that your mysql client connection uses UTF-8, for example by
show variables like '%character_set%';
and that your terminal emulator uses UTF-8, in the shell:
echo äöü|od -t x1
0000000 c3 a4 c3 b6 c3 bc 0a
0000007
[18 Mar 2010 14:04]
Harrison Fisk
This doesn't look like a bug to me.
The issue is with the statement:
alter table EncodingText01 charset=utf8;
This only changed the default character set for the table, not for existing columns, so they are still latin1 from your original definition. Remember that character sets are a per column setting.
Instead you want to use:
ALTER TABLE EncodingText01 CONVERT TO utf8;
This convert all existing columns and data to be in utf8.
root@test~> create table EncodingText01 (
-> ID int unsigned not null auto_increment primary key,
-> Val varchar(250) not null
-> ) engine=innodb charset=latin1;
Query OK, 0 rows affected (0.20 sec)
root@test~> -- insert latin 1 string
root@test~> insert into EncodingText01 (val) values ('äöü');
Query OK, 1 row affected (0.00 sec)
root@test~> -- change to utf8
root@test~> alter table EncodingText01 charset=utf8;
Query OK, 1 row affected (0.18 sec)
Records: 1 Duplicates: 0 Warnings: 0
root@test~> show create table EncodingText01\G
*************************** 1. row ***************************
Table: EncodingText01
Create Table: CREATE TABLE `EncodingText01` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Val` varchar(250) CHARACTER SET latin1 NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf
Notice the column is still latin1 above.
[18 Mar 2010 14:07]
Harrison Fisk
Forgot the CHARACTER SET keyword, it should be: ALTER TABLE EncodingText01 CONVERT TO CHARACTER SET utf8;

Description: When i create a table with charset=latin1 but then try to alter it to utf8 the operation completes but when i then try to insert multibyte unicode character i get the warning: Incorrect string value: '\xC4\x8Dl\xC3\xA1n...' for column 'Val' at row 1 (from sample case code below) and the string is not correctly inserted (characters are replaced with defaults). How to repeat: drop table if exists EncodingText01; drop table if exists EncodingText02; create table EncodingText01 ( ID int unsigned not null auto_increment primary key, Val varchar(250) not null ) engine=innodb charset=latin1; -- insert latin 1 string insert into EncodingText01 (val) values ('äöü'); -- change to utf8 alter table EncodingText01 charset=utf8; -- insert utf8 string insert into EncodingText01 (val) values ('článku'); -- i get warnings -- Compare to: create table EncodingText02 ( ID int unsigned not null auto_increment primary key, Val varchar(250) not null ) engine=innodb charset=utf8; -- insert same string insert into EncodingText02 (val) values ('äöü'); insert into EncodingText02 (val) values ('článku'); -- no warnings for me Suggested fix: Workaround: Create the table from the beginning with the right charset OR dump the whole database and switching the charset in the table create statements and then reimport.