Bug #95700 SQL worbench is unable to import UTF8 csv file
Submitted: 9 Jun 2019 17:08 Modified: 10 Jun 2019 6:11
Reporter: Jirka Stejskal Email Updates:
Status: Verified Impact on me:
Category:MySQL Workbench: Administration Severity:S2 (Serious)
Version:8.0.16, 8.0.26 OS:MacOS (macOS 10.14.x Mojave x86_64)
Assigned to: CPU Architecture:Any
Tags: WBBugReporter

[9 Jun 2019 17:08] Jirka Stejskal
inmporting UTF8 csv file will show unhandled exception:

UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)
18:53:10 [ERR][       pymforms]: Unhandled exception in Python code: 
Traceback (most recent call last):
  File "/Applications/MySQLWorkbench.app/Contents/Resources/libraries/workbench/wizard_page_widget.py", line 97, in go_next
  File "/Applications/MySQLWorkbench.app/Contents/Resources/libraries/workbench/wizard_form.py", line 76, in go_next_page
  File "/Applications/MySQLWorkbench.app/Contents/Resources/plugins/sqlide_power_import_wizard.py", line 186, in page_activated
  File "/Applications/MySQLWorkbench.app/Contents/Resources/plugins/sqlide_power_import_wizard.py", line 344, in call_create_preview_table
  File "/Applications/MySQLWorkbench.app/Contents/Resources/plugins/sqlide_power_import_wizard.py", line 353, in call_analyze
    if not self.active_module.analyze_file():
  File "/Applications/MySQLWorkbench.app/Contents/Resources/plugins/sqlide_power_import_export_be.py", line 537, in analyze_file
    self.has_header = csv.Sniffer().has_header(csvsample)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 399, in has_header
    header = rdr.next() # assume first row is header
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

from the text it is obvious, that Workbench is not handling UTF8 signature bytes at the beggining of the CSV file and crashes.

How to repeat:
create excel with any international characters (I can give you some, but I don't see a way to attach it here)
save the excel as UTF8 encoded csv file (latest Excel)
try to import the content.
the procedure will show the dialog with text: "Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?"

Suggested fix:
simple - properly handle the file with UTF8 signature bytes (u'\ufeff') at the beggining of the file.
[9 Jun 2019 17:12] Jirka Stejskal
file to recreate the problem

Attachment: test_csv.csv (text/csv), 610 bytes.

[10 Jun 2019 6:11] MySQL Verification Team
Hello Jirka Stejskal,

Thank you for the report.

[6 Oct 2021 13:11] MySQL Verification Team
Bug #105140 marked as duplicate of this one.
[10 Feb 13:50] MySQL Verification Team
Bug #106423 marked as duplicate of this one.
[10 Feb 17:29] Valerio Messina
here the UTF-8 share:
[10 Feb 18:07] Valerio Messina
confirmed on 8.0.28
[11 Apr 9:45] Valerio Messina
this bug is in status "Verified", while this duplicated:
is in status "No Feedback"
Not sure which one will be fixed
[1 Jul 16:00] Valerio Messina
table definition as UTF-8

Attachment: MySQL_Workbench_8.0.28_tableDef.png (image/png, text), 157.43 KiB.

[1 Jul 16:01] Valerio Messina
here the table def, field "remarks" is a VARCHAR

Attachment: MySQL_Workbench_8.0.28_tableDef2.png (image/png, text), 94.85 KiB.

[1 Jul 16:01] Valerio Messina
a csv file saved as UTF-8 containing non ASCII chars

Attachment: DB_PSA_UTF-8_lite.csv.7z (application/octet-stream, text), 551 bytes.

[1 Jul 16:01] Valerio Messina
Encodinig import settings does not matter

Attachment: MySQL_Workbench_8.0.28_import.png (image/png, text), 16.03 KiB.

[1 Jul 16:02] Valerio Messina
Importing a file saved as UTF-8 in a DB set as UTF-8, garbage every non ASCII chars, independent of Encoding Import settings

How to repeat:
create a DB table with UTF-8 settings
Save a CSV file with UTF-8 encoding, containing some non ASCII chars
Import the CSV in the table
Look at the fields with non ASCII chars if they are right or garbage
[1 Jul 16:05] Valerio Messina
it seems very important to me that the Workbench is able to import files in UTF-8 format correctly.
We are in 2022 and the UTF-8 format is now the dominant format, not only among Unicode encodings, but also with respect to the ASCII format and the old ANSI, Windows-1252, and ISO-8859-1 codepages, practically only the one for text files:

So please fix this issue with extended / multi-byte characters, it will require little effort, as Windows supports UTF-8 encoding from Windows XP onwards, and on Linux it is the default from the beginning.

Tested with the Workbench
Version 8.0.29 build 1751076 CE (64 bits)