Bug #30946 | mysqldump silently ignores --default-character-set when used with --tab | ||
---|---|---|---|
Submitted: | 11 Sep 2007 7:19 | Modified: | 18 Aug 2009 18:11 |
Reporter: | Marek Matula | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: mysqldump Command-line Client | Severity: | S3 (Non-critical) |
Version: | 5.0.45 | OS: | Linux |
Assigned to: | Gleb Shchepa | CPU Architecture: | Any |
Tags: | character set, mysqldump, tab |
[11 Sep 2007 7:19]
Marek Matula
[18 Sep 2007 23:42]
MySQL Verification Team
Thank you for the bug report. Maybe a documentation bug.
[19 Sep 2007 6:30]
Marek Matula
I believe this is not a documentation bug. If it was, it would be not possible to dump (and restore) tables that have more than one charset in their fields!
[26 Sep 2007 17:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/34566 ChangeSet@1.2531, 2007-09-26 18:22:25+02:00, gshchepa@devsrv-b.mysql.com +5 -0 Fixed bug #30946. When mysqldump was used with both --default-character-set and --tab options the database content in tab-separated data files was always in its internal character set and the --default-character-set switch was ignored. The mysqldump client uses the "SELECT * INTO OUTFILE" statement to provide the --tab command line switch logic. It is supposed to use the symmetrical "LOAD DATA INFILE" statement to load a resulting data file back to a database. By the definition the "LOAD DATA INFILE" statement interprets all fields in the file as having the same character set, regardless of the data types of the columns into which field values are loaded. The character_set_database system variable or the "CHARACTER SET" closure of the "LOAD DATA INFILE" statement is used to define that character set. However, the "SELECT ... INTO OUTFILE" statement dumps table fields in its internal character sets, so "SELECT INTO OUTFILE" and "LOAD DATA INFILE" statements are not symmetrical. To not to modify current server semantic (only to extend it) the "SELECT ... INTO OUTFILE" statement syntax was extended with the "CHARACTER SET" statement like in the "LOAD DATA INFILE". Documentation should be updated: [ http://dev.mysql.com/doc/refman/5.1/en/select.html ] SELECT ... INTO OUTFILE is the complement of LOAD DATA INFILE; the syntax for the export_options part of the statement consists of the same CHARACTER SET, FIELDS and LINES clauses that are used with +^^^^^^^^^^^^^^ the LOAD DATA INFILE statement.
[6 Oct 2007 11:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/35051 ChangeSet@1.2582, 2007-10-06 16:10:17+05:00, gshchepa@gleb.loc +7 -0 Bug#30946: mysqldump silently ignores --default-character-set when used with --tab. Added optional CHARACTER SET clause to the SELECT ... INTO OUTFILE (to complement the same clause in LOAD DATA INFILE). mysqldump is updated to use it in --tab mode.
[26 Jan 2008 17:17]
Baron Schwartz
What does the following mean? "content in tab-separated data files was always in its internal character set" Whose internal character set? The MySQL server, mysqldump, the database, the table, the columns? It would be very helpful if the documentation answered the question, "What character set is used for SELECT INTO OUTFILE?"
[30 Jan 2008 12:33]
Gleb Shchepa
> What does the following mean? > "content in tab-separated data files was always in its internal character set" > Whose internal character set? The MySQL server, mysqldump, the database, > the table, the columns? Character set of each column (charset of whole row is a mix of column charsets, or in other words it is BINARY charset). > It would be very helpful if the documentation answered the question, > "What character set is used for SELECT INTO OUTFILE?" Current implementation uses BINARY charset. Bugfix will extend SELECT INTO OUTFILE syntax to allow data dumping with a special charset value.
[5 Feb 2008 2:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/41685 ChangeSet@1.2594, 2008-02-05 06:37:04+04:00, gshchepa@host.loc +10 -0 Fixed bugs #30946 and #31677. Bug#30946: mysqldump silently ignores --default-character-set when used with --tab. Added optional CHARACTER SET clause to the SELECT ... INTO OUTFILE (to complement the same clause in LOAD DATA INFILE). mysqldump is updated to use it in --tab mode. Added complete support for multibyte delimiter characters, support for UCS2 harsets and hexadecimal representation of malformed characters been added. Fixed bug with 8-bit separators of LOAD DATA INFILE statement. New error message has been provided: "Malformed ... delimiter value", where "..." is one of "FIELDS TERMINATED BY", "FIELDS ENCLOSED BY", "FIELDS ESCAPED BY", "LINES TERMINATED BY" or "LINES STARTING BY", to report errors on bad SELECT ... INTO OUTFILE separator values.
[19 Mar 2008 14:41]
Alexander Barkov
Hi Gleb, I applied your patch and found some problems: 1. Crash with ucs2 table and big5 output: CREATE TABLE t1 ( a varchar(10) default NULL, b varchar(20) default NULL ) ENGINE=MyISAM DEFAULT CHARSET=ucs2; delete from t1; insert into t1 values (0xFFFF,0xFFFF); select * into outfile 't1.txt' character set big5 from t1; ERROR 2013 (HY000): Lost connection to MySQL server during query 2. Wrong file with ucs2 table and default (binary) character set: drop table if exists t1; create table t1 (a varchar(10), b varchar(20)) character set ucs2; insert into t1 values ('a1','b1'); select * into outfile 't1.txt' from t1; select load_File('test/t1.txt'); +--------------------------+ | load_File('test/t1.txt') | +--------------------------+ | \0\0\01 \0\0\01 | +--------------------------+ This result is wrong. I would expect: \0a\01<tab>\0b\01<tab><nl> More comments coming
[26 Jun 2008 14:36]
Susanne Ebrecht
Sergei set bug #37610 as duplicate of this bug here. I made some comments and also debugging on that bug. Please read it also.
[23 Mar 2009 13:45]
Susanne Ebrecht
Bug #43288 is related to this bug here.
[23 Jun 2009 10:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76888
[22 Jul 2009 7:24]
Alexander Barkov
Review comments sent by e-mail: http://lists.mysql.com/commits/79141
[23 Jul 2009 11:49]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/79147
[29 Jul 2009 18:36]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/79554
[31 Jul 2009 17:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/79790
[3 Aug 2009 19:27]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/79959 3052 Gleb Shchepa 2009-08-04 Additional post-commit fix (bug #30946): server version in a conditional commentary at the new SELECT ... INTO OUTFILE ... CHARACTER SET syntax has been updated to 5.1.38.
[4 Aug 2009 19:52]
Bugs System
Pushed into 5.4.4-alpha (revid:alik@sun.com-20090804194615-h40sa098mx4z49qg) (version source revid:gshchepa@mysql.com-20090803192436-r8nqv6zi0a6did5t) (merge vers: 5.4.4-alpha) (pib:11)
[4 Aug 2009 20:45]
Bugs System
Pushed into 5.1.38 (revid:davi.arnaut@sun.com-20090804204317-ggodqkik7de6nfpz) (version source revid:davi.arnaut@sun.com-20090804204317-ggodqkik7de6nfpz) (merge vers: 5.1.38) (pib:11)
[18 Aug 2009 18:11]
Paul DuBois
Noted in 5.1.38, 5.4.4 changelogs. Previously, SELECT ... INTO OUTFILE dumped column values without character set conversion, which could produce data files that cannot be imported without error if different columns used different character sets. A consequence of this is that mysqldump ignored the --default-character-set option if the --tab option was given (which causes SELECT ... INTO OUTFILE to be used to dump data.) INTO OUTFILE now can be followed by a CHARACTER SET clause indicating the character set to which dumped values should be converted. Also, mysqldump adds a CHARACTER SET clause to the SELECT ... INTO OUTFILE statement used to dump data, so that --default-character-set is no longer ignored if --tab is given. Other changes are that SELECT ... INTO OUTFILE enforces that ENCLOSED BY and ESCAPED BY arguments must be a single character, and SELECT ... INTO OUTFILE and LOAD DATA INFILE produce warnings if non-ASCII field or line separators are specified.
[1 Oct 2009 5:59]
Bugs System
Pushed into 5.1.39-ndb-6.3.28 (revid:jonas@mysql.com-20091001055605-ap2kiaarr7p40mmv) (version source revid:jonas@mysql.com-20091001055605-ap2kiaarr7p40mmv) (merge vers: 5.1.39-ndb-6.3.28) (pib:11)
[1 Oct 2009 7:25]
Bugs System
Pushed into 5.1.39-ndb-7.0.9 (revid:jonas@mysql.com-20091001072547-kv17uu06hfjhgjay) (version source revid:jonas@mysql.com-20091001071652-irejtnumzbpsbgk2) (merge vers: 5.1.39-ndb-7.0.9) (pib:11)
[1 Oct 2009 13:25]
Bugs System
Pushed into 5.1.39-ndb-7.1.0 (revid:jonas@mysql.com-20091001123013-g9ob2tsyctpw6zs0) (version source revid:jonas@mysql.com-20091001123013-g9ob2tsyctpw6zs0) (merge vers: 5.1.39-ndb-7.1.0) (pib:11)
[5 Oct 2009 10:50]
Bugs System
Pushed into 5.1.39-ndb-6.2.19 (revid:jonas@mysql.com-20091005103850-dwij2dojwpvf5hi6) (version source revid:jonas@mysql.com-20090930185117-bhud4ek1y0hsj1nv) (merge vers: 5.1.39-ndb-6.2.19) (pib:11)
[5 Oct 2009 18:19]
Paul DuBois
This fix now has been pushed into 5.4.2.
[22 Apr 2010 22:07]
Ryan Mack
I tracked a new bug back to this one (http://bugs.mysql.com/bug.php?id=53088). It appears the new charset conversion code truncates all converted text/blob columns to MAX_FIELD_WIDTH (766) bytes. I am not sure whether it is desirable to convert binary format columns to the default charset at all.