Bug #6479 ALTER TABLE ... changing charset fails for TEXT columns
Submitted: 6 Nov 2004 19:38 Modified: 8 Nov 2004 10:41
Reporter: Christopher Head Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.7-standard OS:Linux (Gentoo Linux, 2.6.9 kernel)
Assigned to: Alexander Barkov CPU Architecture:Any

[6 Nov 2004 19:38] Christopher Head
Description:
Using the ALTER TABLE command to change tables from latin1 to utf8 doesn't work properly on TEXT-type columns. According to the documentation for the ALTER TABLE command:

BEGIN QUOTE
if you want to change all character columns (CHAR, VARCHAR, TEXT) to a new character set, use a statement like this:

ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
...
Warning: The preceding operation will convert column values between the character sets.
END QUOTE

However, in my test, this command worked properly on VARCHAR columns, but, when run on a table with a TEXT column, the column's character set was changed but the actual data was not modified. In other words, the values were reinterpreted as UTF8 instead of being converted.

How to repeat:
CREATE DATABASE test CHARACTER SET utf8;
USE test;
CREATE TABLE test (word VARCHAR(255) NOT NULL) CHARACTER SET latin1;
INSERT INTO test VALUES ('mére');
SELECT word,hex(word),length(word),char_length(word) FROM test\G
Output:
*************************** 1. row ***************************
             word: mére
        hex(word): 6DE97265
     length(word): 4
char_length(word): 4
1 row in set (0.00 sec)

(As expected)

ALTER TABLE test CONVERT TO CHARACTER SET utf8;
SELECT word,hex(word),length(word),char_length(word) FROM test\G
Output:
*************************** 1. row ***************************
             word: mére
        hex(word): 6DC3A97265
     length(word): 5
char_length(word): 4
1 row in set (0.00 sec)

(As expected; note that the value has been converted to proper UTF8)

DROP TABLE test;
CREATE TABLE test (word TEXT NOT NULL) CHARACTER SET latin1;
INSERT INTO test VALUES ('mére');
SELECT word,hex(word),length(word),char_length(word) FROM test\G
Output:
*************************** 1. row ***************************
             word: mére
        hex(word): 6DE97265
     length(word): 4
char_length(word): 4
1 row in set (0.00 sec)

(As expected)

ALTER TABLE test CONVERT TO CHARACTER SET utf8;
SELECT word,hex(word),length(word),char_length(word) FROM test\G
Output:
*************************** 1. row ***************************
             word: m▒re
        hex(word): 6DE97265
     length(word): 4
char_length(word): 4
1 row in set (0.00 sec)

(Expected same result as second output listing; hex() output shows that column's physical value has not changed)

For reference:
SHOW VARIABLES LIKE '%char%';
Output:
*************************** 1. row ***************************
Variable_name: character_set_client
        Value: utf8
*************************** 2. row ***************************
Variable_name: character_set_connection
        Value: utf8
*************************** 3. row ***************************
Variable_name: character_set_database
        Value: utf8
*************************** 4. row ***************************
Variable_name: character_set_results
        Value: utf8
*************************** 5. row ***************************
Variable_name: character_set_server
        Value: latin1
*************************** 6. row ***************************
Variable_name: character_set_system
        Value: utf8
*************************** 7. row ***************************
Variable_name: character_sets_dir
        Value: /usr/local/mysql-standard-4.1.7-pc-linux-i686/share/mysql/charsets/
7 rows in set (0.00 sec)
[8 Nov 2004 9:23] Alexander Keremidarski
Thank you for the excelent test case.