Bug #87593 wbcopytables uses utf8 hardcoded and prevents migration of four-byte characters
Submitted: 30 Aug 2017 6:23 Modified: 30 Aug 2017 13:10
Reporter: Tobias Stock Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Workbench: Migration Severity:S2 (Serious)
Version:MySQL Workbench 6.3 CE, 8.0.17 OS:Windows
Assigned to: CPU Architecture:Any
Tags: charset wbcopytables utf8 utf8mb4

[30 Aug 2017 6:23] Tobias Stock
Description:
Hello,
we want to migration from SQLServer to MySQL by using the migration wizard within MySQL workbench.
The source database uses utf8 which also includes four byte characters.
The migration fails because of unknown characters (to mysql).

I figured out that in copytable.cpp, in method "MySQLCopyDataSource::MySQLCopyDataSource" the charset is set to hardcoded utf8.

Line 1217 in master:
std::string q = "SET NAMES 'utf8'";

Four byte characters will cause an excpetion during migration as normal utf8 in mysql doesnt support four-byte characters.
The target databse uses utf8mb4 but the connection is set fixed to utf8.

How to repeat:
Create table within SQLServer database with charset Latin1_General_100_CI_AS_SC

Add four byte characters (e.g. four-byte smiley https://apps.timwhitlock.info/emoji/tables/unicode) within a test table

start workbench and try to migrate this table including data to mysql

Suggested fix:
I suggest two possible solutions:

1) Remove lines 1217 ff in copytable.cpp and rely on the default charset (which can be configured within my.cnf)

2) replace hardcoded charset with configuration parameter 
   e.g. --charset utf8mb4
[30 Aug 2017 10:16] MySQL Verification Team
Thank you for the bug report. Please provide the sql statement for create table and insert data for SQL Server. Thanks.
[30 Aug 2017 11:21] Tobias Stock
Here is the SQL-Skript for SQL-Server:

CREATE DATABASE [testdb]
COLLATE Latin1_General_100_CI_AS_SC

use testdb

create table test
(
  id int primary key,
  txt nvarchar(max)
)

insert into test values (1, nchar(0x1F601))

When migrating this table the exception is:

`testdb`.`test`:Copying 2 columns of 1 rows from table [testdb].[dbo].[test]
ERROR: `testdb`.`test`:Inserting Data: Incorrect string value: '\xF0\x9F\x98\x81' for column 'txt' at row 1
ERROR: `testdb`.`test`:Failed copying 1 rows
[30 Aug 2017 13:10] MySQL Verification Team
Thank you for the feedback. Verified with SQL script provided.
[13 Mar 2019 18:30] Moshe Lampert
8.0.15: same problem, smilies also
[11 Sep 2019 7:03] MySQL Verification Team
Bug #96815 marked as duplicate of this one
[11 Sep 2019 7:20] Simon Scheib
Still an issue in v8.0.17

Are there any plans on fixing this issue to be able to migrate Databases?
[10 Oct 2019 7:29] Julia Puustinen
Also occurs when Workbench is on a Linux machine and using an ODBC FreeTDS driver to connect to the MS SQL Server. Version 8.0.17.
[24 Jan 2020 23:51] Ron Laughton
Upvoting this one. Experienced the same issue:

utf8mb4 columns being Migrated from MySQL 5.7.23 to 8.0.19 and ending up as utf8 content, rendering the emojis as question marks ?

Used MySQL workbench version 8.0.19

Folks, really surprised by the lack of utf8mb4 with the migration tool. Please educate me if I missed a flag.

Kind regards,

Ron
[17 Sep 2020 15:14] Paul Miller
Please, please fix.