Bug #93565 Collation crash SQL Server
Submitted: 11 Dec 2018 22:10 Modified: 21 Dec 2018 13:53
Reporter: Joe Murray Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Workbench: Migration Severity:S1 (Critical)
Version:8.0.13 OS:Microsoft Windows (win10)
Assigned to: CPU Architecture:x86

[11 Dec 2018 22:10] Joe Murray
Description:
SQL Server to MySQL migration stuck due to collation related error. See gist: https://gist.github.com/JoeMurray/e27530b1635a105c670ceb6ae05e92d5 

There are numerous pages when googling for the error going back many years but none seem to actually solve it.

The database is a backup of a Raiser's Edge database in an old format restored to a new Win10 new SQL Server 2017 instance. MySQL Workbench 8.0.13. I've tried using three different ODBC drivers as System DSN and a File DSN but get the same error. The db has some nvarchar and some varchar columns. There is accented text data as well as bin (image) data in various fields. The error message doesn't indicate the table or column that is causing problems.

How to repeat:
Retry migration.

Suggested fix:
Put in better debugging in terms of identifying the table and column. Provide better documentation of how to setup collations for a standard migration from SQL Server 2017 to MySQL 8. 

I wonder if there may be key-value option(s) that can be added to ODBC File DSN that can alleviate this problem??
[11 Dec 2018 22:11] Joe Murray
The db is much bigger than 3MB and I'm not willing to make it available to everyone.
[11 Dec 2018 22:14] Joe Murray
Updating - only managed to connect to sql server from win10, not macOS
[12 Dec 2018 8:44] Miguel Solorzano
Thank you for the bug report. Are you able to provide a sql script to create a table with some insert data statements so we can create it on SQL Server and to try it's migration?. Thanks.
[19 Dec 2018 16:35] Joe Murray
So through further research I found that the source database I was restoring to SQL Server was using collation Latin1_General_CI_AS. I created a SQL Server database with that collation and restored into it. I retried migration wizard and got the same error as it fell back to a default UTF-16 character set.
....
  File "C:\Program Files\MySQL\MySQL Workbench 8.0 CE\modules\db_mssql_grt.py", line 135, in <lambda>
    con.add_output_converter(-150, lambda value: value if value is None else value.decode('utf-16'))
  File "C:\Program Files\MySQL\MySQL Workbench 8.0 CE\Python\Lib\encodings\utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x65 in position 22: truncated data

How can I tell  MySQL Workstation what character set and collation to use when connecting to MS SQL Server?

I'm willing to debug locally if I knew how to install a developer copy of MySQL Workstation and compile the py files.
[19 Dec 2018 17:27] Joe Murray
Is there a command you can provide that will indicate what table is causing the trouble? There are several hundred tables in the source database, and I am not familiar enough with T-SQL to be able to do the equivalent of mysqldump. Even SHOW CREATE TABLE isn't available in SQL Server.
[21 Dec 2018 13:51] Joe Murray
I was unable to figure out which table / row was causing error despite making a couple of educated guesses, and thus was not able to provide a sample db.

The original DB was in SQL_Latin1_General_CP1_CI_AS. After exploring numerous options, I converted the database to Latin1_General_CI_AS collation using https://stackoverflow.com/a/35180491/3005913. With this Windows rather than SQL collation Workbench successfully runs. I hope that helps someone else work around this Workbench bug.
[21 Dec 2018 13:53] Joe Murray
Giving up on getting help as I can't provide customer sensitive data and Workbench does not identify the table that is causing the error. I'd recommend improving the error message as this has been causing many people problems for over a decade.