Bug #25022 Encoding Issues
Submitted: 13 Dec 2006 0:18 Modified: 30 Apr 2007 9:47
Reporter: Arshad Khan Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Migration Toolkit Severity:S1 (Critical)
Version:1.1.6/1.1.7 OS:Windows (Windows XP)
Assigned to: CPU Architecture:Any
Tags: encoding, illegible characters, UTF-8, utf8

[13 Dec 2006 0:18] Arshad Khan
Description:
Hello, 

I was able to sucessfully migrate data from my MS Access database to MySQL however I am facing a problem. 

Some records contain in my Access database have special characters and they turn in to illegible characters once migrated to MySQL. For example, this record Ọyọ turns in to this ?y?. 

I have chosen utf-8 table collation for MySQL at the time of migration. 

Can anyone help? We have been working on this for the last three days and it is driving us crazy. 

Cheers, 
Ash

How to repeat:
Migrate data from Access using ODBC or Migration Toolkit. MS Access must contain some records which have sepcial characters e.g. Ọyọ, Provence-Alpes-Côte d'Azur, Şoldăneşti, Bayan-Ölgiy, Ọsun.

Compare records after migration.

Suggested fix:
No idea.
[13 Dec 2006 0:41] MySQL Verification Team
Thank you for the bug report. Could you please test with the latest release
version and if still you got the same result provide a small mdb file which
presents the behavior reported. Thanks in advance.
[13 Dec 2006 1:58] Arshad Khan
I have tested with the latest release version for both, Migration Toolkil 1.1.6 and the database server itself but the issue remains.

It appears that there is a bug in the connection string, please refer to the following thread and a posible resolution.

http://forums.mysql.com/read.php?104,124017,124790#msg-124790

In this case the user has migrated data from MS SQL to MySql and in my case I am my source database is Access. I am not good at writing connection strings, would you be kind enough to guide me how tio write a connection string for Acess so I can test this?

I have also uploaded a small mdb file which contains special characters for your reference and testing.

Regards,
Ash
[13 Dec 2006 2:00] Arshad Khan
Access Database Containing Special Characters

Attachment: db3.zip (application/x-zip-compressed, text), 27.68 KiB.

[13 Dec 2006 10:04] MySQL Verification Team
Thank you for the bug report. I am testing with version 1.1.7 (source
build) and at first glance I was not able to find a record with the
behavior reported, however to be sure point me an exactly record which
shows that problem among the 1500 rows which were migrated.
Regarding connection string please search our web site in the MyODBC
pages there are examples you can use. Thanks in advance.
[13 Dec 2006 10:12] Arshad Khan
Hello,

In Access database please look at record # 902, State record for Nigeria is Ọyọ. Then refer to your MySQL database if this record has turned into ?y? then the issue remains.

Please let me know of your finding. I am desparately waiting for a resultion on this.

Regards,
Arshad
[13 Dec 2006 10:53] MySQL Verification Team
Thank you for the feedback. I was able to repeat using multi-languages options
too, below a piece of the insert data script:

('NG', 'Nigeria', '?y?', 'Afijo L'),
('NG', 'Nigeria', '?y?', 'Akinyele'),
('NG', 'Nigeria', '?y?', 'Atiba'),
('NG', 'Nigeria', '?y?', 'Atigbo'),
[13 Dec 2006 11:03] Arshad Khan
Hello,

Thanks for the test. Yes, you have found the problem. In this case we have highlighted issue with one character, there may be many. 

It appears that during the migration process some characters are not being mapped properly to their original encoding.

Whilst you will work towards find a resolution for this, is there a short term/quick work-around I could use? Just becuase of this issue we are unable to launch a major website.

Many thanks for your help so far.
[13 Dec 2006 12:12] Arshad Khan
Hello,

I have an update. 

I exported all records from the Access database to MS SQLExpress without any issues and then exported records to MySQL using the Migration Toolkit, no issues were encountered. All records were imported into MySQL with their original encoding intact. So it appears that the encoding issue is only related to MS Access.

Cheers,
Arshad
[13 Dec 2006 12:56] MySQL Verification Team
Thank you for the feedbCK.
[28 Mar 2007 7:40] Freddy Kaiser
Hi, similar issues with MS-SQL when the DB doesn't have default Collation (Latin... but French_CI_AS).
I was able to resolve the migration with this:
  The default/implicit jdbc connection is:
jdbc:jtds:sqlserver://<ip>:1433/<db>;user=<userid>;password=<pwd>;charset=utf-8;domain=
  => With this it failed
  By using advanced on the Source Database I put this:
jdbc:jtds:sqlserver://<ip>:1433/<db>;user=<userid>;password=<pwd>;domain=
  => With this it was all fine (no other special option)
   Based on the MS JDBC documentation their driver will choose the right charset
Fix:
  Either adding a dropdown/entryfield where this could be defined like the other options or removing the implicit charset=utf-8
[30 Apr 2007 9:47] Michael G. Zinner
The problem with MS Access is, that it does not support charsets and whatever is stored inside the database depends on the local system settings.

When accessing the data via the JDBC/ODBC bridge there is no way to tell the encoding of the data stored in the MS Access database. Therefore we cannot correctly convert the data.

The workaround of going through the MS SQL server is a good one, as there charsets are handled as expected.

The charset handling in the connection dialog for the MS SQL server is limited due to the fact that the available list of charsets depends on the MS SQL server installation and is therefor custom. Manually stating the charset in the Advanced Option is the correct way to deal with this scenario.