Bug #52145 | Problem switching charset from latin1 to utf8 | ||
---|---|---|---|
Submitted: | 17 Mar 2010 16:18 | Modified: | 18 Mar 2010 14:07 |
Reporter: | Feronia P | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: DDL | Severity: | S3 (Non-critical) |
Version: | MySQL 5.1.45 + InnoDB Plugin 1.0.6 | OS: | Linux |
Assigned to: | CPU Architecture: | Any | |
Tags: | 1.0.6, 5.1.45, charset, InnoDB plugin, utf8 |
[17 Mar 2010 16:18]
Feronia P
[18 Mar 2010 4:11]
MySQL Verification Team
Thank you for the bug report. Outputs with and without plugin: [miguel@tikal ~]$ dbs/5.1/bin/mysql -uroot test Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 Server version: 5.1.46-debug-log Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show variables like "%version%"; +-------------------------+---------------------+ | Variable_name | Value | +-------------------------+---------------------+ | innodb_version | 1.0.6 | | protocol_version | 10 | | version | 5.1.46-debug-log | | version_comment | Source distribution | | version_compile_machine | x86_64 | | version_compile_os | unknown-linux-gnu | +-------------------------+---------------------+ 6 rows in set (0.00 sec) mysql> drop table if exists EncodingText01; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> drop table if exists EncodingText02; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> mysql> create table EncodingText01 ( -> ID int unsigned not null auto_increment primary key, -> Val varchar(250) not null -> ) engine=innodb charset=latin1; Query OK, 0 rows affected (0.11 sec) mysql> mysql> -- insert latin 1 string mysql> insert into EncodingText01 (val) values ('äöü'); Query OK, 1 row affected (0.00 sec) mysql> alter table EncodingText01 charset=utf8; Query OK, 1 row affected (0.22 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> insert into EncodingText01 (val) values ('článku'); Query OK, 1 row affected, 1 warning (0.00 sec) mysql> show warnings; +---------+------+---------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xC4\x8Dl\xC3\xA1n...' for column 'Val' at row 1 | +---------+------+---------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> create table EncodingText02 ( -> ID int unsigned not null auto_increment primary key, -> Val varchar(250) not null -> ) engine=innodb charset=utf8; Query OK, 0 rows affected (0.09 sec) mysql> mysql> -- insert same string mysql> insert into EncodingText02 (val) values ('äöü'); Query OK, 1 row affected (0.00 sec) mysql> insert into EncodingText02 (val) values ('článku'); Query OK, 1 row affected (0.00 sec) mysql> [miguel@tikal ~]$ dbs/5.1/bin/mysqladmin -uroot shutdown [miguel@tikal ~]$ dbs/5.1/bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.1.46-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> drop table if exists EncodingText01; Query OK, 0 rows affected (0.05 sec) mysql> drop table if exists EncodingText02; Query OK, 0 rows affected (0.04 sec) mysql> mysql> create table EncodingText01 ( -> ID int unsigned not null auto_increment primary key, -> Val varchar(250) not null -> ) engine=innodb charset=latin1; Query OK, 0 rows affected (0.10 sec) mysql> mysql> -- insert latin 1 string mysql> insert into EncodingText01 (val) values ('äöü'); Query OK, 1 row affected (0.04 sec) mysql> mysql> -- change to utf8 mysql> alter table EncodingText01 charset=utf8; Query OK, 1 row affected (0.27 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> mysql> -- insert utf8 string mysql> insert into EncodingText01 (val) values ('článku'); Query OK, 1 row affected (0.04 sec) mysql> show variables like "%version%"; +-------------------------+---------------------+ | Variable_name | Value | +-------------------------+---------------------+ | protocol_version | 10 | | version | 5.1.46-debug | | version_comment | Source distribution | | version_compile_machine | x86_64 | | version_compile_os | unknown-linux-gnu | +-------------------------+---------------------+ 5 rows in set (0.00 sec) mysql>
[18 Mar 2010 13:44]
Marko Mäkelä
Sorry, this bug does not appear to be in InnoDB, but in MySQL. The problem is that the alter table EncodingText01 charset=utf8; does not re-code the column Val1 to UTF-8. After the command, I still see this: select hex(val) from EncodingText01\G *************************** 1. row *************************** hex(val): E4F6FC That is, the column still is in latin1. After the insert, the inserted string has been converted to latin1 (the missing code point converted to question mark): insert into EncodingText01 (val) values ('článku'); select hex(val) from EncodingText01\G *************************** 1. row *************************** hex(val): E4F6FC *************************** 2. row *************************** hex(val): 3F6CE16E6B75 Note: when you test this bug, ensure that your mysql client connection uses UTF-8, for example by show variables like '%character_set%'; and that your terminal emulator uses UTF-8, in the shell: echo äöü|od -t x1 0000000 c3 a4 c3 b6 c3 bc 0a 0000007
[18 Mar 2010 14:04]
Harrison Fisk
This doesn't look like a bug to me. The issue is with the statement: alter table EncodingText01 charset=utf8; This only changed the default character set for the table, not for existing columns, so they are still latin1 from your original definition. Remember that character sets are a per column setting. Instead you want to use: ALTER TABLE EncodingText01 CONVERT TO utf8; This convert all existing columns and data to be in utf8. root@test~> create table EncodingText01 ( -> ID int unsigned not null auto_increment primary key, -> Val varchar(250) not null -> ) engine=innodb charset=latin1; Query OK, 0 rows affected (0.20 sec) root@test~> -- insert latin 1 string root@test~> insert into EncodingText01 (val) values ('äöü'); Query OK, 1 row affected (0.00 sec) root@test~> -- change to utf8 root@test~> alter table EncodingText01 charset=utf8; Query OK, 1 row affected (0.18 sec) Records: 1 Duplicates: 0 Warnings: 0 root@test~> show create table EncodingText01\G *************************** 1. row *************************** Table: EncodingText01 Create Table: CREATE TABLE `EncodingText01` ( `ID` int(10) unsigned NOT NULL AUTO_INCREMENT, `Val` varchar(250) CHARACTER SET latin1 NOT NULL, PRIMARY KEY (`ID`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf Notice the column is still latin1 above.
[18 Mar 2010 14:07]
Harrison Fisk
Forgot the CHARACTER SET keyword, it should be: ALTER TABLE EncodingText01 CONVERT TO CHARACTER SET utf8;