Bug #31291 ALTER TABLE CONVERT TO CHARACTER SET does not change some data types
Submitted: 29 Sep 2007 3:48 Modified: 28 Jan 2009 22:00
Reporter: Kolbe Kegel Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.44-sp1 OS:Any
Assigned to: Sergei Glukhov CPU Architecture:Any

[29 Sep 2007 3:48] Kolbe Kegel
Description:
ALTER TABLE ... CONVERT TO CHARACTER SET will turn a TEXT column into MEDIUMTEXT.

This can cause significant problems, specifically, when using the MERGE storage engine. Beyond that, it is simply a side effect that should not be occurring.

If you are being affected by this problem, you should change the column type back in a compound ALTER TABLE statement, like this:

alter table t1 convert to character set utf8, modify t text;

How to repeat:
create table t1 (t text) character set latin1;
show create table t1\G
alter table t1 convert to character set utf8;
show create table t1\G

mysql> create table t1 (t text) character set latin1;
Query OK, 0 rows affected (0.07 sec)

mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `t` text
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> alter table t1 convert to character set utf8;
Query OK, 0 rows affected (0.12 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `t` mediumtext
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

Suggested fix:
ALTER TABLE ... CONVERT TO CHARACTER SET should not change the data type of any column.
[8 Oct 2007 4:06] Alexander Barkov
This is by design.

ALTER TABLE creates a minimal data type which is able to store old strings.
If type conversion didn't happen, then ALTER TABLE would
lead to data truncation:

Strings in a latin1 TEXT column can be up to 64k characters long.
Strings in a utf8 column which is able to store 64k characters
can use up to 3*64K bytes. Minimal data type which is able
to store strings of this length is MEDIUMTEXT.

Setting to "documentung". 
To doc team: please mention this behavior in the manual,
and then close as "not a bug".
[9 Oct 2007 21:41] Kolbe Kegel
Interesting. The documentation at http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html makes it sound like TEXT column length restrictions are based on "characters" rather than "bytes". For example, the storage requirements of "TEXT" columns are defined as "L characters + 2 bytes, where L <  2^16". Then, later in the page, "To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 Unicode character set, you must keep in mind that not all utf8 characters use the same number of bytes and can require up to three bytes per character."

If the data storage requirements for TEXT columns are based on bytes rather than characters, that needs to be clarified in the documentation.
[9 Oct 2007 21:53] Kolbe Kegel
One more thing. You say "ALTER TABLE creates a minimal data type which is able to store old strings.", but that's not really true. In the test case for this bug report, there is no string at all, so the size of the "old string" is not an issue. It would be possible for ALTER TABLE to choose the smallest column type greater than or equal to the current specification that would hold existing strings.
[6 Nov 2007 19:49] Paul DuBois
I've added this to the ALTER TABLE section:

For a column that has a data type of VARCHAR or one of the TEXT
types, CONVERT TO CHARACTER SET will change the data type as
necessary to ensure that the new column is long enough to store as
many characters as the original column. For example, a TEXT column
has two length bytes, which store the byte-length of values in the
column, up to a maximum of 65,535. For a latin1 TEXT column, each 
character requires a single byte, so the column can store up to 
65,535 characters. If the column is converted to utf8, each character
might require up to 3 bytes, for a maximum possible length of 3 ×
65,535 = 196,605 bytes. That length will not fit in a TEXT column's
length bytes, so MySQL will convert the data type to MEDIUMTEXT,
which is the smallest string type for which the length bytes can
record a value of 196,605. Similarly, a VARCHAR column might be
converted to MEDIUMTEXT. 

To avoid data type changes of the type just described, do not use
CONVERT TO CHARACTER SET. Instead, use MODIFY to change individual
columns. For example: 

ALTER TABLE t MODIFY latin1_text_col TEXT CHARACTER SET utf8;
ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(M) CHARACTER SET utf8;

I will also make some changes to the data type chapter to correct
the places where it says that the length bytes for variable-length
string types record the length in characters. That is incorrect
because they record the length in bytes, even for character string
types.
[23 Nov 2007 6:51] Alexander Barkov
As discussed earlier, these data types do not work correct:

- TINYTEXT does not change to TEXT - wrong

- MEDIUMTEXT does not change to BIGTEXT - wrong

Changing the bug title to better correspond the problem.
[26 Nov 2008 8:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/59868

2714 Sergey Glukhov	2008-11-26
      Bug#31291 ALTER TABLE CONVERT TO CHARACTER SET does not change some data types
      added ability for TINY[MEDIUM] text fields 
      to be converted to greater subtype during
      alter if necessary(altered charset)
[26 Nov 2008 10:37] Alexander Barkov
The patch http://lists.mysql.com/commits/59868 looks ok to push.
[9 Dec 2008 12:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/61054

2740 Sergey Glukhov	2008-12-09
      Bug#31291 ALTER TABLE CONVERT TO CHARACTER SET does not change some data types
      added ability for TINY[MEDIUM] text fields 
      to be converted to greater subtype during
      alter if necessary(altered charset)
[6 Jan 2009 13:56] Bugs System
Pushed into 5.0.76 (revid:joro@sun.com-20090105160414-8q9j4bi1klkfwiup) (version source revid:azundris@mysql.com-20081230114734-nmsc37ak330zlygn) (merge vers: 5.0.76) (pib:6)
[8 Jan 2009 21:25] Paul DuBois
Noted in 5.0.76 changelog.

ALTER TABLE CONVERT TO CHARACTER SET did not convert TINYTEXT or
MEDIUMTEXT columns to a longer text type if necessary when converting
the column to a different character set.

Setting report to NDI pending push into 5.1.x/6.0.x.
[15 Jan 2009 6:32] Bugs System
Pushed into 5.1.31 (revid:joro@sun.com-20090115053147-tx1oapthnzgvs1ro) (version source revid:azundris@mysql.com-20081230114838-cn52tu180wcrvh0h) (merge vers: 5.1.31) (pib:6)
[15 Jan 2009 16:16] Paul DuBois
Noted in 5.1.31 changelog.

Setting report to NDI pending push into 6.0.x.
[19 Jan 2009 11:21] Bugs System
Pushed into 5.1.31-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090119095303-uwwvxiibtr38djii) (version source revid:tomas.ulin@sun.com-20090115073240-1wanl85vlvw2she1) (merge vers: 5.1.31-ndb-6.2.17) (pib:6)
[19 Jan 2009 12:59] Bugs System
Pushed into 5.1.31-ndb-6.3.21 (revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (version source revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (merge vers: 5.1.31-ndb-6.3.21) (pib:6)
[19 Jan 2009 14:44] Jon Stephens
Setting status back to NDI pending merge to 6.0 tree.
[19 Jan 2009 16:04] Bugs System
Pushed into 5.1.31-ndb-6.4.1 (revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (version source revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)
[20 Jan 2009 18:52] Bugs System
Pushed into 6.0.10-alpha (revid:joro@sun.com-20090119171328-2hemf2ndc1dxl0et) (version source revid:azundris@mysql.com-20081230114916-c290n83z25wkt6e4) (merge vers: 6.0.9-alpha) (pib:6)
[28 Jan 2009 22:00] Paul DuBois
Noted in 6.0.10 changelog.