Bug #46819 | mysqldump without default-character-set made invalid string data. | ||
---|---|---|---|
Submitted: | 20 Aug 2009 7:26 | Modified: | 20 Aug 2009 8:11 |
Reporter: | Meiji KIMURA | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: mysqldump Command-line Client | Severity: | S3 (Non-critical) |
Version: | 5.0, 5.1, next bzr | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[20 Aug 2009 7:26]
Meiji KIMURA
[20 Aug 2009 7:36]
Meiji KIMURA
I'm wrong. Here is a right [suggest to fix] mysqldump use 'SET NAMES' based on skip-character-set-client-handshake. (it's determined by default-character-set).
[20 Aug 2009 8:11]
Sveta Smirnova
Thank you for the report. Verified as described. If not specify option -t mysqldump dumps everything correctly.
[21 Aug 2009 1:02]
Meiji KIMURA
As mentioned by sveta, it runs well without -t option. mysqldump --opt --lock-tables=FALSE -uroot ch_check t1 > t2.dmp; All string data are encoded in utf8.(E9968B). 000004E0 20 44 49 53 41 42 4C 45 20 4B 45 59 53 20 2A 2F DISABLE KEYS */ 000004F0 3B 0A 49 4E 53 45 52 54 20 49 4E 54 4F 20 60 74 ;.INSERT INTO `t 00000500 31 60 20 56 41 4C 55 45 53 20 28 27 E9 96 8B 27 1` VALUES ('...' 00000510 2C 27 E9 96 8B 27 29 3B 0A 2F 2A 21 34 30 30 30 ,'...');./*!4000
[8 Nov 2010 20:07]
Sheeri Cabral
This is actually a very big problem, because often people put non US-ASCII characters in latin1. When these characters are exported with mysqldump with a default of utf8, the characters are garbled, but there's no warnings or errors. This can lead to lost data, especially when upgrading a major series (e.g. 5.0 to 5.1) because you're supposed to export and import the data. When importing a backup of an InnoDB table, if there is an error with one of the parts of the INSERT, the whole INSERT statement rolls back. I have experienced major data loss because the garbled characters cause an error when INSERTed, and it causes perfectly fine data not to import because they're in the same INSERT statement as the garbled data. This is a huge issue for anyone who has upgraded from previous versions where mysqldump (and the server) were latin1 by default. Now their server is still latin1, but mysqldump is happening with utf8. Here's an example: First, set variables such on a MySQL server (5.0 or 5.1): mysql> show global variables like '%char%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec) Then create these tables with data: CREATE TABLE `test1` ( `kwid` int(10) unsigned NOT NULL default '0', `keyword` varchar(80) NOT NULL default '' ) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO `test1` VALUES (1,'watching'),(2,'poet'),(3,'просмотра'),(4,'Поэту'); CREATE TABLE `test2` ( `kwid` int(10) unsigned NOT NULL default '0', `keyword` varchar(80) NOT NULL default '' ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `test2` VALUES (1,'watching'),(2,'poet'),(3,'просмотра'),(4,'Поэту'); Now compare: mysqldump -u root lj > test_export_utf8.sql mysqldump --default-character-set=latin1 -u root lj > test_export_latin1.sql Note that the test export with utf8 has garbled data whereas the export with latin1 does not. mysqldump does set the character_set_client but it is not enough. mysqldump should be changed to set the default character set to whatever the server is set to. There is no reason this bug has to exist.
[8 Nov 2010 20:07]
Sheeri Cabral
I would say that this is in fact a serious bug, not non-critical.
[8 Nov 2010 20:28]
Sheeri Cabral
I just confirmed that the same behavior in the test case is apparent even when the server charset variables are set to: mysql> show global variables like 'char%'; +--------------------------+--------------------------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/local/mysql-5.1.50-osx10.6-x86_64/share/charsets/ | +--------------------------+--------------------------------------------------------+ 8 rows in set (0.00 sec) So I'm not sure why a server with utf8 as the default requires mysqldump to use latin1 to export properly, even for a utf8 table.