Bug #31291 | ALTER TABLE CONVERT TO CHARACTER SET does not change some data types | ||
---|---|---|---|
Submitted: | 29 Sep 2007 3:48 | Modified: | 28 Jan 2009 22:00 |
Reporter: | Kolbe Kegel | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
Version: | 5.0.44-sp1 | OS: | Any |
Assigned to: | Sergei Glukhov | CPU Architecture: | Any |
[29 Sep 2007 3:48]
Kolbe Kegel
[8 Oct 2007 4:06]
Alexander Barkov
This is by design. ALTER TABLE creates a minimal data type which is able to store old strings. If type conversion didn't happen, then ALTER TABLE would lead to data truncation: Strings in a latin1 TEXT column can be up to 64k characters long. Strings in a utf8 column which is able to store 64k characters can use up to 3*64K bytes. Minimal data type which is able to store strings of this length is MEDIUMTEXT. Setting to "documentung". To doc team: please mention this behavior in the manual, and then close as "not a bug".
[9 Oct 2007 21:41]
Kolbe Kegel
Interesting. The documentation at http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html makes it sound like TEXT column length restrictions are based on "characters" rather than "bytes". For example, the storage requirements of "TEXT" columns are defined as "L characters + 2 bytes, where L < 2^16". Then, later in the page, "To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 Unicode character set, you must keep in mind that not all utf8 characters use the same number of bytes and can require up to three bytes per character." If the data storage requirements for TEXT columns are based on bytes rather than characters, that needs to be clarified in the documentation.
[9 Oct 2007 21:53]
Kolbe Kegel
One more thing. You say "ALTER TABLE creates a minimal data type which is able to store old strings.", but that's not really true. In the test case for this bug report, there is no string at all, so the size of the "old string" is not an issue. It would be possible for ALTER TABLE to choose the smallest column type greater than or equal to the current specification that would hold existing strings.
[6 Nov 2007 19:49]
Paul DuBois
I've added this to the ALTER TABLE section: For a column that has a data type of VARCHAR or one of the TEXT types, CONVERT TO CHARACTER SET will change the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, a TEXT column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535. For a latin1 TEXT column, each character requires a single byte, so the column can store up to 65,535 characters. If the column is converted to utf8, each character might require up to 3 bytes, for a maximum possible length of 3 × 65,535 = 196,605 bytes. That length will not fit in a TEXT column's length bytes, so MySQL will convert the data type to MEDIUMTEXT, which is the smallest string type for which the length bytes can record a value of 196,605. Similarly, a VARCHAR column might be converted to MEDIUMTEXT. To avoid data type changes of the type just described, do not use CONVERT TO CHARACTER SET. Instead, use MODIFY to change individual columns. For example: ALTER TABLE t MODIFY latin1_text_col TEXT CHARACTER SET utf8; ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(M) CHARACTER SET utf8; I will also make some changes to the data type chapter to correct the places where it says that the length bytes for variable-length string types record the length in characters. That is incorrect because they record the length in bytes, even for character string types.
[23 Nov 2007 6:51]
Alexander Barkov
As discussed earlier, these data types do not work correct: - TINYTEXT does not change to TEXT - wrong - MEDIUMTEXT does not change to BIGTEXT - wrong Changing the bug title to better correspond the problem.
[26 Nov 2008 8:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/59868 2714 Sergey Glukhov 2008-11-26 Bug#31291 ALTER TABLE CONVERT TO CHARACTER SET does not change some data types added ability for TINY[MEDIUM] text fields to be converted to greater subtype during alter if necessary(altered charset)
[26 Nov 2008 10:37]
Alexander Barkov
The patch http://lists.mysql.com/commits/59868 looks ok to push.
[9 Dec 2008 12:40]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/61054 2740 Sergey Glukhov 2008-12-09 Bug#31291 ALTER TABLE CONVERT TO CHARACTER SET does not change some data types added ability for TINY[MEDIUM] text fields to be converted to greater subtype during alter if necessary(altered charset)
[6 Jan 2009 13:56]
Bugs System
Pushed into 5.0.76 (revid:joro@sun.com-20090105160414-8q9j4bi1klkfwiup) (version source revid:azundris@mysql.com-20081230114734-nmsc37ak330zlygn) (merge vers: 5.0.76) (pib:6)
[8 Jan 2009 21:25]
Paul DuBois
Noted in 5.0.76 changelog. ALTER TABLE CONVERT TO CHARACTER SET did not convert TINYTEXT or MEDIUMTEXT columns to a longer text type if necessary when converting the column to a different character set. Setting report to NDI pending push into 5.1.x/6.0.x.
[15 Jan 2009 6:32]
Bugs System
Pushed into 5.1.31 (revid:joro@sun.com-20090115053147-tx1oapthnzgvs1ro) (version source revid:azundris@mysql.com-20081230114838-cn52tu180wcrvh0h) (merge vers: 5.1.31) (pib:6)
[15 Jan 2009 16:16]
Paul DuBois
Noted in 5.1.31 changelog. Setting report to NDI pending push into 6.0.x.
[19 Jan 2009 11:21]
Bugs System
Pushed into 5.1.31-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090119095303-uwwvxiibtr38djii) (version source revid:tomas.ulin@sun.com-20090115073240-1wanl85vlvw2she1) (merge vers: 5.1.31-ndb-6.2.17) (pib:6)
[19 Jan 2009 12:59]
Bugs System
Pushed into 5.1.31-ndb-6.3.21 (revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (version source revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (merge vers: 5.1.31-ndb-6.3.21) (pib:6)
[19 Jan 2009 14:44]
Jon Stephens
Setting status back to NDI pending merge to 6.0 tree.
[19 Jan 2009 16:04]
Bugs System
Pushed into 5.1.31-ndb-6.4.1 (revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (version source revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)
[20 Jan 2009 18:52]
Bugs System
Pushed into 6.0.10-alpha (revid:joro@sun.com-20090119171328-2hemf2ndc1dxl0et) (version source revid:azundris@mysql.com-20081230114916-c290n83z25wkt6e4) (merge vers: 6.0.9-alpha) (pib:6)
[28 Jan 2009 22:00]
Paul DuBois
Noted in 6.0.10 changelog.