Bug #43041 Migration of Sybase to MySQL project issue with Sybase "Text" datatype
Submitted: 20 Feb 2009 3:53 Modified: 20 Feb 2009 15:23
Reporter: Viswanath G Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Migration Toolkit Severity:S3 (Non-critical)
Version:1.1.16 OS:Microsoft Windows (Win XP Professional)
Assigned to: CPU Architecture:Any
Tags: migration, sybase

[20 Feb 2009 3:53] Viswanath G
Description:
I'm involved in a migration of Sybase to MySQL project. We are trying to migrate using MySQL Migration Tool Kit 1.1.16 version. I'm unable to migrate Sybase 15.0.2 database verison's "Text" datatype column data to "LongText" data. My Sybase database is set to charset latin1.My MySQL database 5.0.45 version is set to charset utf8. 

How to repeat:
I created a source database in Sybase as "testdb" with Charset latin1 and with Table "testtext" with one column as "mytextdata" with "Text" datatype. I inserted the following values to mytextdata "Rechercher les pages en français Programmes de publicité".

Now, I'm using the MySQL Migration Toolkit for conversion of Sybase database. After the conversion of the database I saw the data in mytextdata column was migrated properly. The text looks like "Rechercher les pages en fran�aisProgrammes de publicit�"

Note : I'm setting all charset of MySQL database and column to utf8.
[20 Feb 2009 11:59] Susanne Ebrecht
Many thanks for writing a bug report. This not looks like a bug. The problem maybe is just that Windows is not able to display utf8 proper.

Rechercher les pages en fran�aisProgrammes de publicit�

I think this should be something like:
Rechercher les pages en françaisProgrammes de publicité

For excluding that it is just a display error we need to now output from:
SELECT LENGTH(your_column) FROM your_table;
SELECT HEX(your_column) FROM your_table;

For "Rechercher les pages en françaisProgrammes de publicité"  and utf8 the length should be 57. If it is 55 then it is stored in latin1 or cp850 or whatever else.

SELECT HEX() output for this words in utf8 should be:

52656368657263686572206C657320706167657320656E206672616EC3A761697350726F677261
6D6D6573206465207075626C69636974C3A9

For a better overview I splitted it here:

52 65 63 68 65 72 63 68 65 72 20 ("Rechercher ") (11 byte (including space))
6C 65 73 20 ("les ") (4 byte (including space))
70 61 67 65 73 20 ("pages ") (5 byte including space))
65 6E 20 ("en ") (3 byte (including space))
66 72 61 6E C3A7 61 69 73 ("français") (9 byte because ç needs 2 byte in utf8)
50 72 6F 67 72 61 6D 6D 65 73 20 ("Programmes ") (11 byte (including space))
64 65 20 ("de ") (3 byte (including space))
70 75 62 6C 69 63 69 74 C3A9 ("publicité") (10 byte because é needs 2 byte in utf8)
[20 Feb 2009 15:23] Viswanath G
Basically, I need to used same charset as of Sybase in jdbc connection string  to overcome this issue. Since Migration tool kit was take by default as "utf8" this text wasn't migrating properly