Bug #18560 Bad character handling, weird and strange characters when restoring
Submitted: 27 Mar 2006 22:07 Modified: 26 Jul 2006 13:55
Reporter: Manuel Navas Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Administrator Severity:S1 (Critical)
Version:1.1.9 OS:Windows (Windows XP SP2)
Assigned to: Mike Lischke CPU Architecture:Any
Tags: Backup

[27 Mar 2006 22:07] Manuel Navas
Description:
I have read bugs with ID numbers: 9023, 17938 but none of them supply a solution, and I think is the same problem, i have a database with special characters in spanish, like ñ, á, é, í, ó, ú. These characters are saved correctly in the database, but when I generate a Backup using MySQL Administrator 1.1.9 in the sql files that generates shows: JOSÉ instead of JOSÉ, so when I restore the backup file, it restores JOSÉ
What is wrong?

How to repeat:
create a table with a single column of type varchar and insert a word with a vowel with accent or ñ (alt + 164) and then perform a backup check the sql file created,  then restore the backup file, it will not restore it correctly.

Suggested fix:
Fix the Backup or Restore option in MySQL Administrator, or at least, please explain what we should do.
[28 Mar 2006 14:15] MySQL Verification Team
Thank you for the bug report.
[28 Mar 2006 19:30] Peter Laursen
To both of you:
I think it is important here to know about character set information.

What is the default for the server ([mysqld] section of configuration file)?
Same for the [client] section? Any specific character set declaration in database, table or column definition?

I think you are running a Spanish localized windows? (What Windows version?) If that is the case and if database is a Unicode character set (utf8 probably) I believe a solution would be to let MA issue as the very first statement this:

set sharacter_set_results = latin9;
(maybe latin1 would do too, but I don't know much about Spanish localisations.)
But that option is not available now.

It looks like the stored data are just saved on a byte-per-byte basis.  The É is encoded using two bytes in utf8 and if no set sharacter_set_results is done (or if 'set sharacter_set_results = uft8' or 'set sharacter_set_results = default' is done) with the export the É is also stored as two bytes.  And are read as 'É' by the OS.

That would at least solve the subset of characters that are shared between the Unicode charset of the server and the charset of the client machine.

Maybe also an import option to bypass the OS's character set handling and just import on a byte-per-basis would do the trick.  The SQL file would look weird, but the restored data would be OK.  That would also allow for unicode charcters that are not understand by the OS localisation.
[28 Mar 2006 19:48] Peter Laursen
@manuel ..

If I am right in this (that database is utf8 and that is the problem) then you could try to create a test databases with Latin9 as the character set.  I think you would be able to backup and restore data from this one!
[28 Mar 2006 22:49] Manuel Navas
Well, I have been looking for some answers and I still have not been able to find a solution, I even tried adding to my.ini file the line: default-character-set=utf8 and also tried: default-character-set=Latin9, but it does not work, I really do not what to do. Both lines give me a error "character set 'utf8' is not a compiled character set and it is not specified in the ...\index file".
Also, when restoring the backup file with MySQL Administrator, I changed the Character Option located in the General Tab, to some other options, and still get weird characters. Server version is 4.0.26 and I am backuping and restoring from the same server to different databases but the same tables.
[28 Mar 2006 23:07] Manuel Navas
Also, my character_set variable in my server is latin1, and I couldn´t change it to utf8.
This is the script in the sql file: (notice that in the insert lines are the weird characters, and when I restore the file, those exact weird characters are restored not á, é, ñ, that are the characters I want to restore and are actually the ones that are in my original table. everything is being made in the same machine)

-- MySQL Administrator dump 1.4
--
-- ------------------------------------------------------
-- Server version	4.0.26-debug

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;

/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;

--
-- Create schema dbtest
--

CREATE DATABASE /*!32312 IF NOT EXISTS*/ dbtest;
USE dbtest;

--
-- Table structure for table `dbtest`.`table1`
--

DROP TABLE IF EXISTS `table1`;
CREATE TABLE `table1` (
  `20` varchar(100) NOT NULL default ''
) TYPE=MyISAM;

--
-- Dumping data for table `dbtest`.`table1`
--

/*!40000 ALTER TABLE `table1` DISABLE KEYS */;
INSERT INTO `table1` (`20`) VALUES 
 ('á'),
 ('é'),
 ('ñ');
/*!40000 ALTER TABLE `table1` ENABLE KEYS */;

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
[29 Mar 2006 0:51] Peter Laursen
aha .. it is a 4.0 server.

I tried to test with 4.0.26 and MA 1.1.9 and Danish Characters (æøåÆØÅ), But I get error message (backup thread cannot connect to MySQL)

I can can run a similar backup (and restore) with third-party client SQLyog 5.02 without problems (no matter the character set).  Data in .sql file are with this test (as they should be):

insert into `tt` values 
(1,'æ'),
(2,'ø'),
(3,'å'),
(4,'æøå');

As your databse is not utf8 your letters are encoded use one byte in the database. However looks like MA utf8_encodes them.  

What is your MA version?
[29 Mar 2006 1:08] Peter Laursen
Actually the MA restore dialog displays:

If you are importing a SQL file that has not been crated with MA
you have to choose the character set of the file.  If you have created the backup
with MySQL Administrator the file was written in UTF-8

And I can verify this with MySQL 5.0.19 and Danish characters (æøåÆØÅ).  It works.

It looks like MA depends on the Server to convert the utf-8 encoded file to the charset used with the database?  MySQL 4.0.26 cannot and thus the error!
[29 Mar 2006 16:40] Manuel Navas
thank you Peter for answering me, so, what you are trying to tell me is that when I do a BackUp using MA 1.1.9, the MA uses utf8 to encode the sql file, but when MA 1.1.9 restores the file it won´t use utf8 instead it uses latin1, because that's the character set in the server, is that what are you trying to say? In simple words, MA encodes the file with utf8 but when it restores the backup, MA uses latin1 because latin1 is the character set in the server? If that is the case, well, I tried to set the default-character-set=utf8 in my.ini file but when I start shell>mysqld --console  it throws the following error:

"character set 'utf8' is not a compiled character set and it is not specified in the ..\share\charsets\index file"

And I read in another bug report that utf8 is inside mysqld, and that´s why it does not appear in the share\charsets directory, so why the error message?.

Anyways, I just want to restore my backup file (generated with MA 1.1.9) correctly, using MA 1.1.9, I mean I perform a Backup with MA, and I just want to restore it with MA.

Just to clarify some things about my scenario:

OS: WXP SP2
MA: 1.1.9
mysql server: 4.0.26
show variables like 'char%':
character_set  -->  latin1
character_sets --> latin1 big5 czech euc_kr gb2312 gbk latin1_de sjis tis620 ujis dec8 dos german1 hp8 koi8_ru latin2 swe7 usa7 cp1251 danish hebrew win1251 estonia hungarian koi8_ukr win1251ukr greek win1250 croat cp1257 latin5

Please tell me, what should I do, in order to sucessfully restore my sql file generated with MA 1.1.9 and apparently encoded with utf8. 

Thanks.
[29 Mar 2006 21:38] Peter Laursen
Well .. 

Miguel or some other MySQL GUI must answer in detail what is happening here.  But to me it looks like your backup is UTF-8 encoded.  MySQL 4.0 does not have that charset, so If MA does not have a charset conversion library of its own it can't be restored.  That is what I think.  However it surprises me how it is able to encode that way.  It can't be the server that does it!

The UTF-8 encoding is a variable byte length encoding.   The ASCII subrange of characters is one byte long, alphabetic writings 2 bytes long, Far Eastern writings 3 bytes long per character (and some rare historical writings are mapped using 4 bytes).

So in UTF-8 accented vowels, German/Swedish/Hungarian/Turkish 'Umlaut' characters (ä, ö, ü, ë etc), Spanish ñ and Danish special charcters (æ,ø and å) etc are all encoded using 2 bytes.  And that is what your file looks like!
[29 Mar 2006 21:44] Peter Laursen
To restore that file you can  'Search and replace' each two-byte sequence with a one-byte ANSI representation of of each charater in a text editor (the ANSI 'western' mapping corresponds to MySQL 'latin1').  I guess we are only talking about 5 or a little more characters?
[29 Mar 2006 22:11] Manuel Navas
Thanks Peter, by the way, my name is Manuel not Miguel :o) Actually there are 12 characters counting Capitalized chars. So, find and replace? mmm. Well, that´ll do it, but.... 
I'll keep looking for a solution, if I find it, I'll post it.
[29 Mar 2006 23:07] Manuel Navas
FINALLY!! a solution:
Thanks to Eduardo A. Romero Gómez who posted a comment in section "5.10.1. The Character Set Used for Data and Sorting" of MySQL Manual 3.23, 4.0 , 4.1.
He wrote about a program called iconv and I think he used it in Linux, but if you are a Windows user and speak spanish, go to this link an do the following:

http://gnuwin32.sourceforge.net/packages/libiconv.htm

or search for "iconv for windows" in yahoo then download and install the program, it is a DOS based program.
In a DOS shell, go to the directory where iconv.exe resides.
then write the following command:
shell>iconv -f UTF-8 -t ISO-8859-1 badfile.sql > goodfile.sql
and that's it, the file has been corrected and ready to restore it using MA.  I hope this can help.
[29 Mar 2006 23:24] Manuel Navas
Ho Well, I try to restore it, but MA does not have the ISO-8859-1 library.... so it can't be restored.
[30 Mar 2006 8:53] Peter Laursen
That ISO charset is used for HTML only.  It is descibed here 
http://www.natural-innovations.com/wa/doc-charset.html

I still think you can 'search and replace'.  Simply find out what two-string character sequence is used for representing 'ñ' for instance.  Copy those two to the clipboard and paste it into the 'search' field of the editor and put 'ñ' in 'replace'.  next 'replace all'.  Do so what all the non-ASCII characters involved.  Don't forget to have a backup of the original file.
[30 Mar 2006 19:43] Manuel Navas
Hello Peter, you know what, I remembered that some months ago I could successfully restore a sql file with MA but it was an older version, so I found MA version 1.0.26 in the MA's download page, so I decided to give it a shoot, guess what? MA Version 1.0.26 restored correctly my sql file!!   Could you please find out why? I mean, why version 1.0.26 does it well and 1.1.9 does not? or at least fix 1.1.9 so it can restore sql files like 1.0.26. I think it could help a lot of people, specially non english people.
Thanks.
[30 Mar 2006 19:54] Manuel Navas
File with special chars

Attachment: dbtest.sql (application/octet-stream, text), 1.57 KiB.

[30 Mar 2006 19:57] Manuel Navas
Peter, I am attaching the file dbtest.sql with á, é, í, ó, ú, etc... Please, would you be so kind to restore it using MA 1.1.9 and MA 1.0.26 so you could see for yourself that the oldest version restores it successfully and the newest does not.
Thanks a lot!
[30 Mar 2006 20:00] Manuel Navas
MY MISTAKE! The MA's older version is 1.0.21, not 1.0.26, sorry.
[30 Mar 2006 22:52] Peter Laursen
@manuel ...

Actually the MySQL download page tells that MA 1.0 is for use with 4.0 and MA 1.1 is for use with 4.1++ .  I can see that your latest attechment is NOT utf8-encoded. Not need to import to verify that!

what is your situation now?  Do you have a backup that you need to restore or was it just an 'exercise'?
[30 Mar 2006 23:13] Manuel Navas
File generated with MA 1.1.5

Attachment: dbtest2.sql (application/octet-stream, text), 1.57 KiB.

[30 Mar 2006 23:15] Manuel Navas
Peter, here is a file generated with MA 1.1.5 try to restoret with 1.1.9 and 1.0.21
[30 Mar 2006 23:19] Manuel Navas
if the file is not utf-8 then, I don't know, I am generating it with MA in WinXP. The thing is that I could successfully restore my backup file using MA 1.0.21.
I did not upload my real backup file because it was 200 MB all data using spanish chars, so I had to create test databases.
[31 Mar 2006 0:55] Peter Laursen
OK .. 

They won't restore.  None of the two you uploaded.  They are not uft8-encoded (I inspected them in a HEX-editor), but MA sets the wrong charset at import!

Just add this line to your file at the beginning - and it will import correctly with 1.1.9 too (tested with MySQL 4.0 and 5.1):

SET CHARACTER_SET_RESULTS = latin1;

before
--
-- Create schema dbtest
--

and you'll 'override' what MA does wrong.  I don't have MA 1.0 installed but works with 1.1.

I also have to delete the 'TYPE = MYISAM'  because MA 1.1.19 changes that to *ENGINE ...' what MySQL 4.0 does not understand.  But maybe that is no issue with  1.0. I really do not like that MA tries to be so 'smart' that it does different than what the file reads when importing.  This is bad!  I always use SQLyog for backup/restore (much more options and faster) and I did never encounter those MySQL Administrator issues for that reason.

I think this discussion is now not really a bug report but probably should rather have been in the Forums.  But as MySQL are silent as a grave, it is their own fault, I believe!
[31 Mar 2006 17:03] Manuel Navas
I think you're right Peter, this bug should be in the Forums :o).
Well, this is so funny, I added the line SET CHARACTER_SET_RESULTS = latin1;
where you told me to, and also erased all the TYPE=MyISAM entries and I restored the file using MA 1.1.9 and it didn't work! je je je so funny, then I restored the same exact file using MA 1.0.21 and IT DID WORK! It makes me smile really, one tool does it right, and the 'new' tool just doesn't.
Ok Peter, I acomplished what I wanted using MA 1.0.21, so please do not remove it from the download page :o). My recomendation for people that have a 4.0 server and want to restore sql files with special chars and it is generated with MA use MA 1.0.21 to restore it. 
Peter, thanks for your help and patience.
[25 Jun 2006 21:17] carlos vera
Manuel, can yoy please write to my email carlosveraq1@hotmail.com if some final fix have been found for this problem. If you write in spanish is better.

Thank you

Carlos
[26 Jul 2006 13:55] Mike Lischke
I admin using a newer MA with a 4.0 server is a bit problematic. Everything in the tool is made to handle strings as Unicode (sometimes UTF-8, sometimes UTF-16). If a latin1 file is used for restore it is sequentially converted to UTF-8. However this causes trouble with a 4.0 server, because it is unable to handle it.

The best step would be to officially stop support for 4.0 servers with our GUI tools, but we hesitated to announce that because of the large user base (and most of the time MA works well, also with a 4.0 server).

I'm sorry to say that, but we cannot fix this problem. It would require too many changes in the internal handling and we have much other work waiting.

Best regards, 
Mike Lischke
[22 Feb 2007 22:47] Daniel Albrecht
Hello all!

I use MySQL server 5.0 with latin1 charset and i had the same problem...so I tried
to set the charset from latin1 to utf8 (using the instance config wizzard) and
after that the backup and restore is worked for me perfect...its just a comment
So may be it doesn't help you...:)