Bug #95700 | SQL worbench is unable to import UTF8 csv file | ||
---|---|---|---|
Submitted: | 9 Jun 2019 17:08 | Modified: | 10 Jun 2019 6:11 |
Reporter: | Jirka Stejskal | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Workbench: Administration | Severity: | S2 (Serious) |
Version: | 8.0.16, 8.0.26, 8.0.31, 8.0.32 | OS: | MacOS (macOS 10.14.x Mojave x86_64) |
Assigned to: | CPU Architecture: | Any | |
Tags: | WBBugReporter |
[9 Jun 2019 17:08]
Jirka Stejskal
[9 Jun 2019 17:12]
Jirka Stejskal
file to recreate the problem
Attachment: test_csv.csv (text/csv), 610 bytes.
[10 Jun 2019 6:11]
MySQL Verification Team
Hello Jirka Stejskal, Thank you for the report. regards, Umesh
[6 Oct 2021 13:11]
MySQL Verification Team
Bug #105140 marked as duplicate of this one.
[10 Feb 2022 13:50]
MySQL Verification Team
Bug #106423 marked as duplicate of this one.
[10 Feb 2022 17:29]
Valerio Messina
here the UTF-8 share: https://en.wikipedia.org/wiki/File:UTF-8_takes_over.png
[10 Feb 2022 18:07]
Valerio Messina
confirmed on 8.0.28
[11 Apr 2022 9:45]
Valerio Messina
this bug is in status "Verified", while this duplicated: https://bugs.mysql.com/bug.php?id=51233 is in status "No Feedback" Not sure which one will be fixed
[1 Jul 2022 16:00]
Valerio Messina
table definition as UTF-8
Attachment: MySQL_Workbench_8.0.28_tableDef.png (image/png, text), 157.43 KiB.
[1 Jul 2022 16:01]
Valerio Messina
here the table def, field "remarks" is a VARCHAR
Attachment: MySQL_Workbench_8.0.28_tableDef2.png (image/png, text), 94.85 KiB.
[1 Jul 2022 16:01]
Valerio Messina
a csv file saved as UTF-8 containing non ASCII chars
Attachment: DB_PSA_UTF-8_lite.csv.7z (application/octet-stream, text), 551 bytes.
[1 Jul 2022 16:01]
Valerio Messina
Encodinig import settings does not matter
Attachment: MySQL_Workbench_8.0.28_import.png (image/png, text), 16.03 KiB.
[1 Jul 2022 16:02]
Valerio Messina
Importing a file saved as UTF-8 in a DB set as UTF-8, garbage every non ASCII chars, independent of Encoding Import settings How to repeat: create a DB table with UTF-8 settings Save a CSV file with UTF-8 encoding, containing some non ASCII chars Import the CSV in the table Look at the fields with non ASCII chars if they are right or garbage
[1 Jul 2022 16:05]
Valerio Messina
it seems very important to me that the Workbench is able to import files in UTF-8 format correctly. We are in 2022 and the UTF-8 format is now the dominant format, not only among Unicode encodings, but also with respect to the ASCII format and the old ANSI, Windows-1252, and ISO-8859-1 codepages, practically only the one for text files: https://en.wikipedia.org/wiki/File:UTF-8_takes_over.png So please fix this issue with extended / multi-byte characters, it will require little effort, as Windows supports UTF-8 encoding from Windows XP onwards, and on Linux it is the default from the beginning. Tested with the Workbench Version 8.0.29 build 1751076 CE (64 bits)
[18 Aug 2022 1:42]
Zachary Read
The errors generated are Python ones. It seems like the application isn't reading the file with the proper encoding from the get-go. ----------------------------------------------- How to reproduce with MySQL Workbench: 1. Create a CSV file named "test.csv" with the following two lines. Make it UTF-8 encoded (without BOM). column1,column2 名,連 2. Run the "Table Data Import Wizard" and go through the steps. 3. RESULT: This should result in the following error: Unhandled exception: 'charmap' codec can't decode byte 0x90 in position 17: character maps to <undefined> ----------------------------------------------- How to reproduce with only Python: 1. Create a CSV file named "test.csv" with the following two lines. Make it UTF-8 encoded (without BOM). column1,column2 名,連 2. Create and run a Python script like the following: with open(r"test.csv") as f: for line in f: print(line) 3. RESULT: This should result in the following error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 17: character maps to <undefined> ----------------------------------------------- How to resolve with Python: 4. Modify the previous Python script as follows. Notice the addition of the encoding parameter. with open(r"test.csv", encoding='utf-8') as f: for line in f: print(line) 5. Run the script. 6. RESULT: The script will run correctly and print the lines without issue. Comment: This should be a really easy fix. There's already an option asking users what encoding to use, so you could just take that value and use it when reading the file. ----------------------------------------------- Workaround: For anyone stuck on this, you can convert your CSV to a JSON file (there are some tools online). The importer will work fine with the JSON file.
[3 Nov 2022 13:01]
MySQL Verification Team
Bug #108866 marked as duplicate of this one.
[13 Jan 12:12]
MySQL Verification Team
Bug #109613 marked as duplicate of this one.
[10 Feb 12:55]
MySQL Verification Team
Bug #110023 marked as duplicate of this one.