Description:
Using the ALTER TABLE command to change tables from latin1 to utf8 doesn't work properly on TEXT-type columns. According to the documentation for the ALTER TABLE command:
BEGIN QUOTE
if you want to change all character columns (CHAR, VARCHAR, TEXT) to a new character set, use a statement like this:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
...
Warning: The preceding operation will convert column values between the character sets.
END QUOTE
However, in my test, this command worked properly on VARCHAR columns, but, when run on a table with a TEXT column, the column's character set was changed but the actual data was not modified. In other words, the values were reinterpreted as UTF8 instead of being converted.
How to repeat:
CREATE DATABASE test CHARACTER SET utf8;
USE test;
CREATE TABLE test (word VARCHAR(255) NOT NULL) CHARACTER SET latin1;
INSERT INTO test VALUES ('mére');
SELECT word,hex(word),length(word),char_length(word) FROM test\G
Output:
*************************** 1. row ***************************
word: mére
hex(word): 6DE97265
length(word): 4
char_length(word): 4
1 row in set (0.00 sec)
(As expected)
ALTER TABLE test CONVERT TO CHARACTER SET utf8;
SELECT word,hex(word),length(word),char_length(word) FROM test\G
Output:
*************************** 1. row ***************************
word: mére
hex(word): 6DC3A97265
length(word): 5
char_length(word): 4
1 row in set (0.00 sec)
(As expected; note that the value has been converted to proper UTF8)
DROP TABLE test;
CREATE TABLE test (word TEXT NOT NULL) CHARACTER SET latin1;
INSERT INTO test VALUES ('mére');
SELECT word,hex(word),length(word),char_length(word) FROM test\G
Output:
*************************** 1. row ***************************
word: mére
hex(word): 6DE97265
length(word): 4
char_length(word): 4
1 row in set (0.00 sec)
(As expected)
ALTER TABLE test CONVERT TO CHARACTER SET utf8;
SELECT word,hex(word),length(word),char_length(word) FROM test\G
Output:
*************************** 1. row ***************************
word: m▒re
hex(word): 6DE97265
length(word): 4
char_length(word): 4
1 row in set (0.00 sec)
(Expected same result as second output listing; hex() output shows that column's physical value has not changed)
For reference:
SHOW VARIABLES LIKE '%char%';
Output:
*************************** 1. row ***************************
Variable_name: character_set_client
Value: utf8
*************************** 2. row ***************************
Variable_name: character_set_connection
Value: utf8
*************************** 3. row ***************************
Variable_name: character_set_database
Value: utf8
*************************** 4. row ***************************
Variable_name: character_set_results
Value: utf8
*************************** 5. row ***************************
Variable_name: character_set_server
Value: latin1
*************************** 6. row ***************************
Variable_name: character_set_system
Value: utf8
*************************** 7. row ***************************
Variable_name: character_sets_dir
Value: /usr/local/mysql-standard-4.1.7-pc-linux-i686/share/mysql/charsets/
7 rows in set (0.00 sec)
Description: Using the ALTER TABLE command to change tables from latin1 to utf8 doesn't work properly on TEXT-type columns. According to the documentation for the ALTER TABLE command: BEGIN QUOTE if you want to change all character columns (CHAR, VARCHAR, TEXT) to a new character set, use a statement like this: ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name; ... Warning: The preceding operation will convert column values between the character sets. END QUOTE However, in my test, this command worked properly on VARCHAR columns, but, when run on a table with a TEXT column, the column's character set was changed but the actual data was not modified. In other words, the values were reinterpreted as UTF8 instead of being converted. How to repeat: CREATE DATABASE test CHARACTER SET utf8; USE test; CREATE TABLE test (word VARCHAR(255) NOT NULL) CHARACTER SET latin1; INSERT INTO test VALUES ('mére'); SELECT word,hex(word),length(word),char_length(word) FROM test\G Output: *************************** 1. row *************************** word: mére hex(word): 6DE97265 length(word): 4 char_length(word): 4 1 row in set (0.00 sec) (As expected) ALTER TABLE test CONVERT TO CHARACTER SET utf8; SELECT word,hex(word),length(word),char_length(word) FROM test\G Output: *************************** 1. row *************************** word: mére hex(word): 6DC3A97265 length(word): 5 char_length(word): 4 1 row in set (0.00 sec) (As expected; note that the value has been converted to proper UTF8) DROP TABLE test; CREATE TABLE test (word TEXT NOT NULL) CHARACTER SET latin1; INSERT INTO test VALUES ('mére'); SELECT word,hex(word),length(word),char_length(word) FROM test\G Output: *************************** 1. row *************************** word: mére hex(word): 6DE97265 length(word): 4 char_length(word): 4 1 row in set (0.00 sec) (As expected) ALTER TABLE test CONVERT TO CHARACTER SET utf8; SELECT word,hex(word),length(word),char_length(word) FROM test\G Output: *************************** 1. row *************************** word: m▒re hex(word): 6DE97265 length(word): 4 char_length(word): 4 1 row in set (0.00 sec) (Expected same result as second output listing; hex() output shows that column's physical value has not changed) For reference: SHOW VARIABLES LIKE '%char%'; Output: *************************** 1. row *************************** Variable_name: character_set_client Value: utf8 *************************** 2. row *************************** Variable_name: character_set_connection Value: utf8 *************************** 3. row *************************** Variable_name: character_set_database Value: utf8 *************************** 4. row *************************** Variable_name: character_set_results Value: utf8 *************************** 5. row *************************** Variable_name: character_set_server Value: latin1 *************************** 6. row *************************** Variable_name: character_set_system Value: utf8 *************************** 7. row *************************** Variable_name: character_sets_dir Value: /usr/local/mysql-standard-4.1.7-pc-linux-i686/share/mysql/charsets/ 7 rows in set (0.00 sec)