Bug #26711 | Binary content 0x00 sometimes becomes 0x5C 0x00 after dump/load | ||
---|---|---|---|
Submitted: | 28 Feb 2007 5:40 | Modified: | 30 Jul 2007 10:44 |
Reporter: | Yuan WANG | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
Version: | 5.0.30, 4.1, 5.1, 5.2-falcon | OS: | Linux (linux) |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
[28 Feb 2007 5:40]
Yuan WANG
[28 Feb 2007 9:01]
Sveta Smirnova
Please do not submit the same bug more than once. An existing bug report already describes this very problem. Even if you feel that your issue is somewhat different, the resolution is likely to be the same. Because of this, we hope you add your comments to the original bug instead. Thank you for your interest in MySQL. Duplicate of Bug #23619 Will be available in next release.
[5 Mar 2007 10:59]
Sveta Smirnova
test case
Attachment: bug23619.test (application/octet-stream, text), 395 bytes.
[5 Mar 2007 11:01]
Sveta Smirnova
Thank you for the additional information. Verified using attached test case.
[12 Mar 2007 10:31]
Alexander Barkov
This is a long known problem, the source of it is that the REVERSE SOLIDUS "\" (0x5C), which is used in MySQL as an escape character, can also be a part of a multibyte sequence, like 0xEE5C, which encodes a Chinese ideogram in GBK. When loading a dump, "mysql" misinterprets EE5C as as Chinese character, then loads "0" as a DIGIT ZERO 0x30, instead of the null character 0x00. There's a --hex-blob parameter which makes "mysqldump" use HEX encoding for BINARY, VARBINARY and BLOB datatypes. Please try "mysqldump --hex-blob" and give feedback if it works for you this way. This parameter is off by default. But anyway, although the --hex-blob workaround exists, this is still as a bug. "mysqldump" side must be fixed to produce a dump in a safe way, which should further be understood by "mysql" without misinterpretring the REVERS SOLIDUS character, even if it occasionally appears after a "multibyte head" byte, like 0xEE is.
[12 Mar 2007 10:56]
Yuan WANG
--hex-blob doesn't work when the -T option is used, i.e. the data is dumped into a data file, not INSERT statements in SQL.
[12 Mar 2007 11:17]
Alexander Barkov
Does it work with --hex-blob, without -T ?
[13 Mar 2007 5:57]
Yuan WANG
If --hex-blob is used and -T is not used, then no problems.
[9 Apr 2007 9:03]
Yuan WANG
Any plan for this bug?
[4 Jun 2007 13:06]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28057 ChangeSet@1.2513, 2007-06-04 18:06:39+05:00, bar@mysql.com +6 -0 Bug#26711 "Binary content 0x00 sometimes becomes 0x5C 0x00 after dump/load" Problem: "SELECT INTO OUTFILE" created incorrect dumps for BLOBs, so "LOAD DATA" later incorrectly interpreted 0x5C as the second byte of a multi-byte sequence, instead of escape character. Fix: adding escaping of multi-byte heads.
[4 Jun 2007 13:12]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28058 ChangeSet@1.2513, 2007-06-04 18:12:28+05:00, bar@mysql.com +6 -0 Bug#26711 "Binary content 0x00 sometimes becomes 0x5C 0x00 after dump/load" Problem: "SELECT INTO OUTFILE" created incorrect dumps for BLOBs, so "LOAD DATA" later incorrectly interpreted 0x5C as the second byte of a multi-byte sequence, instead of escape character. Fix: adding escaping of multi-byte heads.
[7 Jun 2007 7:06]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28257 ChangeSet@1.2513, 2007-06-07 12:05:43+05:00, bar@mysql.com +6 -0 Bug#26711 "Binary content 0x00 sometimes becomes 0x5C 0x00 after dump/load" Problem: "SELECT INTO OUTFILE" created incorrect dumps for BLOBs, so "LOAD DATA" later incorrectly interpreted 0x5C as the second byte of a multi-byte sequence, instead of escape character. Fix: adding escaping of multi-byte heads.
[7 Jun 2007 8:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28268 ChangeSet@1.2513, 2007-06-07 13:16:49+05:00, bar@mysql.com +6 -0 Bug#26711 "Binary content 0x00 sometimes becomes 0x5C 0x00 after dump/load" Problem: "SELECT INTO OUTFILE" created incorrect dumps for BLOBs, so "LOAD DATA" later incorrectly interpreted 0x5C as the second byte of a multi-byte sequence, instead of escape character. Fix: adding escaping of multi-byte heads.
[7 Jun 2007 9:04]
Alexander Barkov
Pushed into 5.0.44-rpl Pushed into 5.1.20-rpl
[21 Jun 2007 20:11]
Bugs System
Pushed into 5.0.46
[21 Jun 2007 20:15]
Bugs System
Pushed into 5.1.20-beta
[23 Jun 2007 8:14]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html Bugfix documented as follows in the 5.0.46 and 5.1.20 changelogs: Binary content <literal>0x00</literal> in a <literal>BLOB</literal> column sometimes became <literal>0x5C 0x00</literal> following a dump and reload, which could cause problems with data using multi-byte character sets such as <literal>GBK</literal> (Chinese). This was due to a problem with <literal>SELECT INTO OUTFILE</literal> whereby <literal>LOAD DATA</literal> later incorrectly interpreted <literal>0x5C</literal> as the second byte of a multi-byte sequence rather than as the <literal>SOLIDUS</literal> (<quote>\</quote>) character, used by MySQL as the escape character. (Bug #26711)
[30 Jul 2007 7:34]
Sveta Smirnova
Bug is still repeatable with current versions.
[30 Jul 2007 7:35]
Sveta Smirnova
better test case
Attachment: bug26711.test (application/octet-stream, text), 381 bytes.
[30 Jul 2007 7:35]
Sveta Smirnova
Wrong result: drop table if exists t1; set character_set_database=gbk; show variables like 'char%'; Variable_name Value character_set_client latin1 character_set_connection latin1 character_set_database gbk character_set_filesystem binary character_set_results latin1 character_set_server latin1 character_set_system utf8 character_sets_dir /users/ssmirnova/src/mysql-5.0/sql/share/charsets/ create table `t1` ( `a` blob ) engine=myisam default charset=latin1; insert into t1 values(0xee00); select hex(a) from t1; hex(a) EE00 truncate table t1; load data local infile 'var/t1.txt' into table t1; select hex(a) from t1; hex(a) EE5C30
[30 Jul 2007 10:43]
Alexander Barkov
Sveta, this wrong result is expected where character set is latin1. You must fix the test to use GBK with mysqldump to get correct results: --exec $MYSQL_DUMP --default-character-set=gbk -T var test
[31 Jul 2007 13:26]
Alexander Barkov
> I am a bit confused why character set affects BLOB field. Especcially > if I don't use option -T with mysqldump no need to specify > default-character-set. Because MySQL "LOAD DATA INFILE" processor considers the entire file as having a single character set. It always un-escapes all values before inserting them into fields according to this character set, and field types do no matter. So, if you need to load a file into GBK database, then you must make sure that this file is GBK-compatible. I.e. when running "mysqldump -T" you must specify --default-character-set=gbk. > May be note in documentation needed? Yes, I agree, it should be noted.
[31 Jul 2007 15:49]
Paul DuBois
I have added this to the LOAD DATA INFILE section: LOAD DATA INFILE interprets all fields in the file as having the same character set, regardless of the data types of the columns into which field values are loaded. For proper interpretation of file contents, you must ensure that it was written with the correct character set. For example, if you write a data file with mysqldump -T or by issuing a SELECT ... INTO OUTFILE statement in mysql, be sure to use a --default-character-set option with mysqldump or mysql so that output is written in the character set to be used when the file is loaded with LOAD DATA INFILE.
[5 Dec 2007 15:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/39323 ChangeSet@1.2651, 2007-12-05 16:43:44+01:00, msvensson@pilot.mysql.com +1 -0 Remove disabling of ctype_big5, bug#26711 has nothing to do with it
[26 Mar 2008 19:00]
Bugs System
Pushed into 6.0.5-alpha
[31 Mar 2008 7:56]
Bugs System
Pushed into 5.1.24-rc