Bug #40201 | Data truncation while converting from latin1 to binary to utf8 | ||
---|---|---|---|
Submitted: | 21 Oct 2008 10:23 | Modified: | 22 Oct 2008 15:30 |
Reporter: | Rakesh Kumar | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
Version: | 5.0.45,5.0.51,5.1.23-ndb-6.2.15-log | OS: | Linux |
Assigned to: | CPU Architecture: | Any | |
Tags: | characterset issue, data truncation |
[21 Oct 2008 10:23]
Rakesh Kumar
[21 Oct 2008 18:16]
Sveta Smirnova
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php BLOB fields don't have character set information, so MySQL has no knowledge which character set is used in the name column. You can see it if run SELECT query while name field is BLOB: CREATE TABLE `test` ( `id` int(10) DEFAULT NULL, `name` text )ENGINE=InnoDB DEFAULT CHARSET=utf8; insert into test values (1,"Abc® Core™2 processor with XYZ™ technology"); select * from test; id name 1 Abc® Core™2 processor with XYZ™ technology alter table test modify name text charset 'latin1'; select * from test; id name 1 Abc® Core™2 processor with XYZ™ technology alter table test modify name blob; select * from test; id name 1 Abc# Core#2 processor with XYZ# technology select id, convert(name using latin1) from test; id convert(name using latin1) 1 Abc® Core™2 processor with XYZ™ technology alter table test modify name text charset 'utf8'; Warnings: Warning 1366 Incorrect string value: '\xAE Core...' for column 'name' at row 1 select * from test; id name 1 Abc
[22 Oct 2008 10:21]
Rakesh Kumar
Thanks for the response. I already went through the documentation... However, this indeed looks like a bug. Your response talks about display part, whereas, I am talking about physical data truncation in table. Conversion from "blob" to "text charset 'utf8'" truncates data. Conversion from "blob" to "text charset 'latin1'" works just fine... In the same step to reproduce: mysql> alter table test modify name blob; mysql> alter table test modify name text charset 'latin1'; -- This works mysql> alter table test modify name blob; mysql> alter table test modify name text charset 'utf8'; --This does not work and truncates data mysql> show warnings; +---------+------+-------------------------------------------------------------------+ | Level | Code | Message | +---------+------+-------------------------------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xAE Core...' for column 'name' at row 1 | +---------+------+-------------------------------------------------------------------+ 1 row in set (0.00 sec) Looking at the warning...\xAE is a valid character in both latin1 and utf8 character set. I understand that binary column does not have any character set information, so the conversion may not be successful in case of characters displayed using higher bit (ascii value > 127) being present in the original content. But why would the data get truncated? Thanks.
[22 Oct 2008 11:09]
Rakesh Kumar
To add further...when I alter the table to use "memory" storage engine and use varchar/varbinary instead of text/blob, the same conversion works just fine... Looks like the issue is only with Disc based tables (I tried with myisam and innodb...). Thanks
[22 Oct 2008 11:11]
Rakesh Kumar
Looks like the issue was closed prematurely...re-opening it. Thanks.
[22 Oct 2008 11:46]
Rakesh Kumar
My apologies...the issue comes on memory table also...please ignore my comment on issue not coming on memory tables and only on disc based tables.
[22 Oct 2008 15:30]
Sveta Smirnova
Thank you for the feedback. Please look at the output: alter table test modify name blob; select * from test; id name 1 Abc# Core#2 processor with XYZ# technology You can see field name is not "Abc® Core™2 processor with XYZ™ technology", but contains garbage characters. These characters are rejected when you convert this field to utf8. So I close the report as "Not a Bug" again.