Bug #28969 mysqldump double encodes utf-8 innodb tables
Submitted: 8 Jun 2007 9:08 Modified: 11 Jun 2007 8:41
Reporter: Allan Jensen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S3 (Non-critical)
Version:5.0.37 OS:Linux (Kubuntu)
Assigned to: CPU Architecture:Any

[8 Jun 2007 9:08] Allan Jensen
Description:
All our tables are utf-8 InnoDB tables defined using:
"ENGINE=InnoDB DEFAULT CHARSET=utf8"

When using mysqldump with default encoding (supposedly utf-8), these tables end-up as double utf-8 (4 characters instead of 2 to encode a standard latin-1 character).

How to repeat:
1. Declare a table with VARCHAR data and parameters "ENGINE=InnoDB DEFAULT CHARSET=utf8". 
2. Insert utf8 data.
3. Call mysqldump with default options. Using pipes '>' or result-file '-r' makes no difference.

Importing the resulted dump now corrupt all non-ASCII characters.

Suggested fix:
Add a check so that mysqldump does not reencode tables that are already in the mysqldump default-character-set

Temporary work-around: If all your tables are in utf-8 you can get a valid dump by using -default-character-set=latin1. This will strangely enough preserve the utf8 encoding.
[8 Jun 2007 11:03] Sveta Smirnova
Thank you for the report.

Please indicate accurate version of MySQL you use.
[8 Jun 2007 11:24] Allan Jensen
$ mysql --version
mysql  Ver 14.12 Distrib 5.0.24a, for pc-linux-gnu (i486) using readline 5.2
[8 Jun 2007 11:37] Sveta Smirnova
Thank you for the feedback.

But version 5.0.24a is quite old. Additionally I can not repeat described beaviour with current 5.0.41 version.

Please upgrade to current version, try with it and if you can repeat error reopen the report and provide output of SHOW CREATE TABLE problem_table
[8 Jun 2007 12:26] Allan Jensen
The server version is actually 5.0.37. I've upgrade the client tools to 5.0.41 and the bug is still reproducable.

This is one of the many tables, that are dumped incorrectly:

mysql> SHOW CREATE TABLE `cms_data`;
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table    | Create Table                                                                                                                                                                                                                            |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| cms_data | CREATE TABLE `cms_data` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `data` mediumtext NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Holds all texts that are translateable in the cms_system.' |
[9 Jun 2007 17:50] Sveta Smirnova
Thank you for the feedback.

I still can not repeat error with your table structure with following test:

set names utf8;

CREATE TABLE `t1` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `data` mediumtext NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Holds all texts that are translateable in
the cms_system.';
INSERT INTO t1 (data) VALUES ("Русский текст"),
("ещё");

--exec $MYSQL_DUMP test t1

Please provide one-two rows which can not be outputed by mysqldump correctly and indicate all options you pass to mysqldump
[11 Jun 2007 8:41] Allan Jensen
Okay it seems related to the weird 'set names' command.

I've been using a utf-8 capable console, and inserting in utf-8 tables, assuming that the connection was standard 8bit. 

It appears MySQL has an additional and needless middle conversion, which assumed I didn't have a utf-8 capable source, and was therefore extra-encoded everything.

Using "Set names 'utf8'" everywhere fixes the bug.
[7 Jul 2011 9:47] George Sun
I use the mysqldump to make a backup of my database.
My database was destroyed by a accident and now I want to restore it.
But the SQL file is double encoded by the bug. Is there any solution for my data to go back?
Thanks.
[6 Feb 2012 21:43] Kevin Franden
Problem:
As of today (Feb 6, 2012) mysqldump 5.1.36 still exhibits the faulty behavior (unable to successfully dump a MySQL database in UTF-8)

Workaround:
The original workaround (mysqldump --default-character-set=latin1) still works (ie: adding the flag successfully dumps database using UTF-8 symbols)
[8 Feb 2012 1:51] Kevin Franden
CORRECTION:
My above stmt is flat wrong.

I was able to successfully export then import a UTF8 database by doing the following:

EXPORTING:
Ensure all necessary tables are UTF8-ready:
    ALTER TABLE bar DEFAULT CHARACTER SET utf8;

Ensure database default charset it UTF8:
    ALTER DATABASE foo DEFAULT CHARACTER SET utf8;

Export the data in the database:
    mysqldump -uUSER -pPASSWD --default-character-set=utf8 --ignore-table=foo.bletch --databases foo -r OUTPUT_FILE

* It's important to specify the character (--default-character-set=utf8)
* It's important to capture the output directly into a file (-r OUTPUT_FILE)
* It was important for me to specify the "--databases" flag so my dumpfile would contain a "USE <dbname>;" statement

On a *NIX platform, you ought to be able to see the correct fonts on at least some of the data you just exported to a flat-file.  If you don't see fonts you recognize, your dump might not be restoreable.

RESTORATION:
Super-easy:
    mysql -uUSER -pPASSWD < OUTPUT_FILE

Note that I did not specify the database name on the restore.  Since I used "--databases", mysqldump automagically put the "USE foo;" statement in the dump itself for me.

Additionally, be wary of using tools that assume a char is 1 byte.  Using *NIX tools like sed and awk to fiddle with your dumpfile WILL break the encoding;