Bug #12123 | mysqldump --tab results in text file which can't be imported | ||
---|---|---|---|
Submitted: | 22 Jul 2005 20:39 | Modified: | 31 Oct 2005 19:46 |
Reporter: | Peter Zaitsev (Basic Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 4.1 | OS: | Any (all) |
Assigned to: | Bugs System | CPU Architecture: | Any |
[22 Jul 2005 20:39]
Peter Zaitsev
[22 Jul 2005 20:50]
Jorge del Conde
Thanks for your bug report Peter
[14 Oct 2005 23:49]
Patrick Galbraith
I can't seem to get a problem with this. I created a utf8 db, latin1 table, inserted two rows, one with regular US text, one with German, dumped, and reloaded, and the data is the same before and after. mysql> create database udb character set utf8; Query OK, 1 row affected (0.01 sec) mysql> connect udb; Connection id: 13 Current database: udb mysql> create table lat(a int, b varchar(32)) CHARACTER SET=latin1; ^[[AQuery OK, 0 rows affected (0.46 sec) mysql> insert into lat values (1, 'this is a test'); Query OK, 1 row affected (0.02 sec) mysql> insert into lat values (2, 'über nachfaßt'); Query OK, 1 row affected (0.01 sec) mysql> select * from lat; +------+----------------+ | a | b | +------+----------------+ | 1 | this is a test | | 2 | über nachfaßt | +------+----------------+ 2 rows in set (0.01 sec) mysql> \q Bye patg@krsna:~/mysql-build/mysql-4.1/client> ./mysqldump -uroot -S /tmp/mysqld-4.1-5552.sock --add-drop-database --databases udb > udb.dump patg@krsna:~/mysql-build/mysql-4.1/client> ./mysql -uroot -S /tmp/mysqld-4.1-5552.sock udb < udb.dump patg@krsna:~/mysql-build/mysql-4.1/client> ./mysql -u root -S /tmp/mysqld-4.1-5552.sock udb Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 16 to server version: 4.1.16-debug-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> select * from lat; +------+----------------+ | a | b | +------+----------------+ | 1 | this is a test | | 2 | über nachfaßt | +------+----------------+ 2 rows in set (0.01 sec)
[15 Oct 2005 2:20]
Peter Zaitsev
Patrick, In your case database character is utf8 while table character is latin1. Latin1 letters typically have the same codes in utf8 this is why you're lucky. You also used the phrase which could be presented in latin1 character set which is always poor test. I would suggest you to create latin1 database and utf8 table and test something which can't be presented in latin1, for example russian - "Тестовая Строка" - if you need some sample text.
[15 Oct 2005 3:53]
Patrick Galbraith
Peter, In the report, you said "Populate it with some data with high latin1 characters". By "high latin", do you mean "cyrillic"? I'll give that a try. Question though: If a table is supposed to be latin1, is it suposed to transparently deal with inserting cyrillic characters, dumping, and reloading? What does making a table latin1 in a utf8 database mean - as far as moving data around? Thanks much!
[15 Oct 2005 4:02]
Peter Zaitsev
Patrick, Sorry I was unclear in this case. I just repeated this bug with 4.1.14. Here is full session: mysql> create database xx character set latin1; Query OK, 1 row affected (0.06 sec) mysql> use xx; Database changed mysql> create table tbl(c varchar(255)) character set utf8; Query OK, 0 rows affected (0.01 sec) mysql> set names utf8; Query OK, 0 rows affected (0.00 sec) mysql> insert into tbl values ("тест тест тест"); Query OK, 1 row affected (0.00 sec) mysql> select * from tbl; +----------------------------+ | c | +----------------------------+ | тест тест тест | +----------------------------+ 1 row in set (0.00 sec) mysql> select * from tbl into outfile '/tmp/bug'; Query OK, 1 row affected (0.00 sec) mysql> load data infile '/tmp/bug' into table tbl; Query OK, 1 row affected (0.01 sec) Records: 1 Deleted: 0 Skipped: 0 Warnings: 0 mysql> select * from tbl; +----------------------------------------------------------+ | c | +----------------------------------------------------------+ | тест тест тест | | теÑÑ‚ теÑÑ‚ теÑÑ‚ | +----------------------------------------------------------+ 2 rows in set (0.00 sec) As you can see the data which was loaded back in the table is different than what was offloaded from this table :)
[15 Oct 2005 4:26]
Patrick Galbraith
Cutting the cyrillic from the email just gives an emptry string, so I saved the file to disk, then used vim and cut and pasted between xterms: mysql> insert into lat values (3, '–¢–µ—~A—~B–æ–≤–∞—~O –°—~B—~@–æ–∫–∞'); Query OK, 1 row affected, 1 warning (0.01 sec) mysql> select * from lat; +------+----------------------------------+ | a | b | +------+----------------------------------+ | 1 | this is a test | | 2 | ¸ber nachfaflt | | 3 | –¢–µ—~A—~B–æ–≤–∞—~O –°—~B—~@–æ–∫ | +------+----------------------------------+ 3 rows in set (0.02 sec) mysql> \q Bye patg@krsna:~/mysql-build/mysql-4.1/client> mysqldump -uroot -S /tmp/mysqld-4.1-5552.sock --add-drop-database --databases udb > udb.dump patg@krsna:~/mysql-build/mysql-4.1/client> vi udb.dump patg@krsna:~/mysql-build/mysql-4.1/client> ./mysql -u root -S /tmp/mysqld-4.1-5552.sock udb < udb.dump patg@krsna:~/mysql-build/mysql-4.1/client> ./mysql -u root -S /tmp/mysqld-4.1-5552.sock udb Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 43 to server version: 4.1.16-debug-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> select * from lat; +------+----------------------------------+ | a | b | +------+----------------------------------+ | 1 | this is a test | | 2 | ¸ber nachfaflt | | 3 | –¢–µ—~A—~B–æ–≤–∞—~O –°—~B—~@–æ–∫ | +------+----------------------------------+ 3 rows in set (0.01 sec) I don't know what the trick is to cutting cyrillic from an email and inserting it into the db via xterm. I do see that mysqldump handles what I pasted. thanks!
[15 Oct 2005 4:28]
Patrick Galbraith
So, looking at how you did it, perhaps I need to figure out how to type cyrillic. I've always used cut-n-paste, and don't know the key mappings.
[15 Oct 2005 4:59]
Peter Zaitsev
Here is the hint if you want to avoid typing: select unhex('D182D0B5D181D18220D182D0B5D181D18220D182D0BAD0B5D181D182'); Just make sure to do "set names utf8" and make sure your console is utf8
[15 Oct 2005 8:27]
Sergei Golubchik
It's a duplicate of BUG#6012. And in any case it's not a bug, the manual documents it as ==== As of MySQL 4.1, the character set indicated by the `character_set_database' system variable is used to interpret the information in the file. `SET NAMES' and the setting of `character_set_client' do not affect input interpretation. Note that it's currently not possible to load `UCS2' data files. ====
[15 Oct 2005 20:29]
Sergei Golubchik
Ok, Peter explained in email that the bug here is that mysqldump --tab/mysqlimport can result in incorrect data. So this backup strategy does not work anymore. This is trivial to fix - mysqlimport should do set @@character_set_database=binary; before loading the data.
[15 Oct 2005 20:39]
Sergei Golubchik
Test case: create table t1 (a text character set utf8, b text character set latin1); insert t1 values (0x4F736E616272C3BC636B, 0x4BF66C6E); mysqldump --tab=. test mysql < t1.sql mysqlimport test t1.txt
[15 Oct 2005 21:03]
Patrick Galbraith
I'll work on both of these bugs per your suggested fix.
[20 Oct 2005 0:38]
Patrick Galbraith
I made a typo on the bug commit, and so I can't find my patch in lists.mysql.com, however: ChangeSet 1.2453 05/10/19 17:27:02 patg@krsna.patg.net +1 -0 BUG# 12123 Made change to mysqlimport to set character_set_database to binary to make importing various charsets/columns work correctly. client/mysqlimport.c 1.52 05/10/19 17:25:56 patg@krsna.patg.net +6 -0 BUG# 12132 Added 'set @@character_set_database=binary' to make loading of tables with mixed charset types and non-latin characters load. # This is a BitKeeper patch. What follows are the unified diffs for the # set of deltas contained in the patch. The rest of the patch, the part # that BitKeeper cares about, is below these diffs. # User: patg # Host: krsna.patg.net # Root: /home/patg/mysql-build/mysql-4.1 --- 1.51/client/mysqlimport.c 2005-09-07 15:49:57 -07:00 +++ 1.52/client/mysqlimport.c 2005-10-19 17:25:56 -07:00 @@ -304,6 +304,12 @@ fprintf(stdout, "Loading data from SERVER file: %s into %s\n", hard_path, tablename); } + if (mysql_query(sock, "set @@character_set_database=binary;")) + { + db_error_with_table(sock, tablename); + DBUG_RETURN(1); + } + sprintf(sql_statement, "LOAD DATA %s %s INFILE '%s'", opt_low_priority ? "LOW_PRIORITY" : "", opt_local_file ? "LOCAL" : "", hard_path);
[25 Oct 2005 21:51]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/31473
[31 Oct 2005 16:23]
Patrick Galbraith
fixed in 5.0.16 bk
[31 Oct 2005 19:46]
Paul DuBois
Noted in 5.0.16 changelog.