Bug #95700 SQL worbench is unable to import UTF8 csv file
Submitted: 9 Jun 2019 17:08 Modified: 10 Jun 2019 6:11
Reporter: Jirka Stejskal Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Workbench: Administration Severity:S2 (Serious)
Version:8.0.16, 8.0.26, 8.0.31, 8.0.32 OS:MacOS (macOS 10.14.x Mojave x86_64)
Assigned to: CPU Architecture:Any
Tags: WBBugReporter

[9 Jun 2019 17:08] Jirka Stejskal
Description:
inmporting UTF8 csv file will show unhandled exception:

UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)
18:53:10 [ERR][       pymforms]: Unhandled exception in Python code: 
Traceback (most recent call last):
  File "/Applications/MySQLWorkbench.app/Contents/Resources/libraries/workbench/wizard_page_widget.py", line 97, in go_next
    self.main.go_next_page()
  File "/Applications/MySQLWorkbench.app/Contents/Resources/libraries/workbench/wizard_form.py", line 76, in go_next_page
    self.pages[index].page_activated(True)
  File "/Applications/MySQLWorkbench.app/Contents/Resources/plugins/sqlide_power_import_wizard.py", line 186, in page_activated
    self.call_create_preview_table()
  File "/Applications/MySQLWorkbench.app/Contents/Resources/plugins/sqlide_power_import_wizard.py", line 344, in call_create_preview_table
    self.create_preview_table(self.call_analyze())
  File "/Applications/MySQLWorkbench.app/Contents/Resources/plugins/sqlide_power_import_wizard.py", line 353, in call_analyze
    if not self.active_module.analyze_file():
  File "/Applications/MySQLWorkbench.app/Contents/Resources/plugins/sqlide_power_import_export_be.py", line 537, in analyze_file
    self.has_header = csv.Sniffer().has_header(csvsample)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 399, in has_header
    header = rdr.next() # assume first row is header
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

from the text it is obvious, that Workbench is not handling UTF8 signature bytes at the beggining of the CSV file and crashes.

How to repeat:
create excel with any international characters (I can give you some, but I don't see a way to attach it here)
save the excel as UTF8 encoded csv file (latest Excel)
try to import the content.
the procedure will show the dialog with text: "Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?"

Suggested fix:
simple - properly handle the file with UTF8 signature bytes (u'\ufeff') at the beggining of the file.
[9 Jun 2019 17:12] Jirka Stejskal
file to recreate the problem

Attachment: test_csv.csv (text/csv), 610 bytes.

[10 Jun 2019 6:11] MySQL Verification Team
Hello Jirka Stejskal,

Thank you for the report.

regards,
Umesh
[6 Oct 2021 13:11] MySQL Verification Team
Bug #105140 marked as duplicate of this one.
[10 Feb 2022 13:50] MySQL Verification Team
Bug #106423 marked as duplicate of this one.
[10 Feb 2022 17:29] Valerio Messina
here the UTF-8 share:
https://en.wikipedia.org/wiki/File:UTF-8_takes_over.png
[10 Feb 2022 18:07] Valerio Messina
confirmed on 8.0.28
[11 Apr 2022 9:45] Valerio Messina
this bug is in status "Verified", while this duplicated:
https://bugs.mysql.com/bug.php?id=51233
is in status "No Feedback"
Not sure which one will be fixed
[1 Jul 2022 16:00] Valerio Messina
table definition as UTF-8

Attachment: MySQL_Workbench_8.0.28_tableDef.png (image/png, text), 157.43 KiB.

[1 Jul 2022 16:01] Valerio Messina
here the table def, field "remarks" is a VARCHAR

Attachment: MySQL_Workbench_8.0.28_tableDef2.png (image/png, text), 94.85 KiB.

[1 Jul 2022 16:01] Valerio Messina
a csv file saved as UTF-8 containing non ASCII chars

Attachment: DB_PSA_UTF-8_lite.csv.7z (application/octet-stream, text), 551 bytes.

[1 Jul 2022 16:01] Valerio Messina
Encodinig import settings does not matter

Attachment: MySQL_Workbench_8.0.28_import.png (image/png, text), 16.03 KiB.

[1 Jul 2022 16:02] Valerio Messina
Importing a file saved as UTF-8 in a DB set as UTF-8, garbage every non ASCII chars, independent of Encoding Import settings

How to repeat:
create a DB table with UTF-8 settings
Save a CSV file with UTF-8 encoding, containing some non ASCII chars
Import the CSV in the table
Look at the fields with non ASCII chars if they are right or garbage
[1 Jul 2022 16:05] Valerio Messina
it seems very important to me that the Workbench is able to import files in UTF-8 format correctly.
We are in 2022 and the UTF-8 format is now the dominant format, not only among Unicode encodings, but also with respect to the ASCII format and the old ANSI, Windows-1252, and ISO-8859-1 codepages, practically only the one for text files:
https://en.wikipedia.org/wiki/File:UTF-8_takes_over.png

So please fix this issue with extended / multi-byte characters, it will require little effort, as Windows supports UTF-8 encoding from Windows XP onwards, and on Linux it is the default from the beginning.

Tested with the Workbench
Version 8.0.29 build 1751076 CE (64 bits)
[18 Aug 2022 1:42] Zachary Read
The errors generated are Python ones. It seems like the application isn't reading the file with the proper encoding from the get-go.

-----------------------------------------------

How to reproduce with MySQL Workbench:

1. Create a CSV file named "test.csv" with the following two lines. Make it UTF-8 encoded (without BOM).

column1,column2
名,連

2. Run the "Table Data Import Wizard" and go through the steps.

3. RESULT: This should result in the following error:

Unhandled exception: 'charmap' codec can't decode byte 0x90 in position 17: character maps to <undefined>

-----------------------------------------------

How to reproduce with only Python:

1. Create a CSV file named "test.csv" with the following two lines. Make it UTF-8 encoded (without BOM).

column1,column2
名,連

2. Create and run a Python script like the following:

with open(r"test.csv") as f:
    for line in f:
        print(line)

3. RESULT: This should result in the following error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 17: character maps to <undefined>

-----------------------------------------------

How to resolve with Python:

4. Modify the previous Python script as follows. Notice the addition of the encoding parameter.

with open(r"test.csv", encoding='utf-8') as f:
    for line in f:
        print(line)

5. Run the script.

6. RESULT: The script will run correctly and print the lines without issue.

Comment: This should be a really easy fix. There's already an option asking users what encoding to use, so you could just take that value and use it when reading the file.

-----------------------------------------------

Workaround:

For anyone stuck on this, you can convert your CSV to a JSON file (there are some tools online). The importer will work fine with the JSON file.
[3 Nov 2022 13:01] MySQL Verification Team
Bug #108866 marked as duplicate of this one.
[13 Jan 2023 12:12] MySQL Verification Team
Bug #109613 marked as duplicate of this one.
[10 Feb 2023 12:55] MySQL Verification Team
Bug #110023 marked as duplicate of this one.
[27 Mar 2023 16:54] Fuck You
3 years after this bug is reported and there is still no fix.
How embarrassing is this? Such a critical feature is yet missing and this is from Oracle.
Utter disgrace!
[27 Mar 2023 19:00] Valerio Messina
please set the OS to Windows too
[17 Apr 2023 16:05] MySQL Verification Team
Bug #110705 marked as duplicate of this one.
[12 Mar 12:45] MySQL Verification Team
Bug #114317 marked as duplicate of this one.
[12 Mar 15:49] Valerio Messina
just tested an import of an UTF-8 file using Workbench 8.0
Version 8.0.36 build 3737333 CE (64 bits) Community

The file contain a field as so:
" £ ç ° § € à è é ì ò ù "
the imported field become as so:
" £ ç ° § ⬠à è é ì ò ù "
So the trouble is not only for non-European encoding, but for all non-UK/US characters, everything that is not 7 bit ASCII.
Note: Cannot import '£' is annoing for UK too.

While importing via command line 'mysql':

$ mysql --version
mysql  Ver 15.1 Distrib 10.11.6-MariaDB, for debian-linux-gnu (x86_64)

work as expected.