Bug #89048 Interntaional character not supported in postgres to mysql migration
Submitted: 24 Dec 2017 2:08 Modified: 15 Jun 2018 12:07
Reporter: Thiruvannamalai Narayanan Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Workbench: Migration Severity:S2 (Serious)
Version:6.3.10 OS:Windows (Microsoft Windows 10 Home Single Language)
Assigned to: CPU Architecture:Any
Tags: WBBugReporter

[24 Dec 2017 2:08] Thiruvannamalai Narayanan
Description:
Hi,

When I try to migrate a single database from postgres(9.2) to MySQL(5.6) I am getting erros like 

"ERROR:`mysql_migrate`.`visit`:Inserting Data: Incorrect string value: '\xC3?nis_...' for column 'initial_request' at row 35

ERROR: `mysql_migrate`.`visit`:Failed copying 1586025 rows
FINISHED"

I am pretty sure that this error relates to character set encoding and tried changing the characterset to utf8 and utf8mb4 from latin but still throws same error.

Mysql uses utf8 encoding (client and server)
Postgres uses utf8 encoding (client and server)

Used all of possible drivers like Unicode ANSI and native drivers(default)
no possible results of changing drivers.

NOte: when I manually run a insert query it accepts those characters.

Unable to share complete logs because it contains some senstive info:

How to repeat:
source selection selected all the possible drivers one by one and tested
target selection done as usual
schema list fetched
single schema selected
everything works good until bulk data transfer
while doing buld transfer it throws error " Incorrect string value: '\xC3?nis_...' for column 'initial_request' at row 35 " multiple times

When this field "initial_user_agent" comes to insert values "�?nis_Ben" it throws error and stopped executing as shown above

Suggested fix:
Please fix this issues ASAp
[24 Dec 2017 2:10] Thiruvannamalai Narayanan
Attached log

Attachment: migrate_error.txt (text/plain), 7.68 KiB.

[26 Dec 2017 12:58] Daniël van Eeden
Incorrect string value: '\xC3?nis_...' for column 'initial_request'

0xC3 (decimal: 195, binary: 0b11000011) starts a two-byte UTF-8 character, but the second byte '?' is not a valid continuation byte.

Python 3.6.3 (default, Oct  9 2017, 12:07:10) 
[GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> b'\xc3?nis'.decode('windows-1252')
'Ã?nis'
>>> b'\xC3?nis_...'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 0: invalid continuation byte
[27 Dec 2017 4:20] Thiruvannamalai Narayanan
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 0: invalid continuation byte

Fine if it cant be decoded then what is the work around, how to fix it.

Kindly help is there any way around to achieve it
[27 Dec 2017 19:40] Daniël van Eeden
You could store this in varbinary instead of varchar.
[28 Dec 2017 6:50] Thiruvannamalai Narayanan
Do i need to do it for all the tables one by one because i have N number of tables presents in database.

Also kindly advice how to change the datatype because its throwing error on bulk data transfer section if i edit the datatype in manual editing section.

If there are any scripts to achieve this task that would be much helpful.
[15 Jun 2018 12:07] Chiranjeevi Battula
Hello Narayanan,

Thank you for the bug report.
I could not repeat the issue at our end using with MySQL Workbench 8.0.11 version on Windows 10.
If you can provide more information, feel free to add it to this bug and change the status back to 'Open'.

Thank you for your interest in MySQL.

Thanks,
Chiranjeevi.