| Bug #10195 | LOAD DATA INFILE and UTF-8 | ||
|---|---|---|---|
| Submitted: | 27 Apr 2005 10:18 | Modified: | 7 Nov 2008 23:32 |
| Reporter: | BEN TAARIT Moncef | ||
| Status: | No Feedback | ||
| Category: | Server | Severity: | S2 (Serious) |
| Version: | 4.1.8-nt | OS: | Microsoft Windows (windows) |
| Assigned to: | Target Version: | ||
| Tags: | affects_connectors | ||
[27 Apr 2005 10:18]
BEN TAARIT Moncef
[27 Apr 2005 15:46]
Sinisa Milivojevic
In 5.0 LOAD DATA syntax will be extended to include charset valid for the file to be loaded. Right now, LOAD DATA utilises database charset.
[15 May 2005 19:35]
Adrian Klingel
This is not fixed in 5.0. The behavior is the same. This is a very serious problem. To duplicate: 1) Create a database with utf8 character set 2) Create a table with a single column 3) Create a file that contains this value: Présentoir de comptoir en forme de bocal, grands porte-clés, 24 pièces. 4) Use "load data local infile" to load it. 5) Select from the table The value you see will be "Pr". It truncates on the accented character.
[23 Sep 2005 23:39]
Grant Echols
I see that this has been sitting idle for several months now. I'm curious to know if its scheduled to be fixed. We are facing some pretty stiff performance problems without it. If a fix is forthcoming we'll use it. If not we'll pursue other options, but we'd like to know before making this decision. This affects all diacritic names involved in our user database and others where international characters are used.
[28 Sep 2005 22:54]
Mathieu Lutfy
I also ran into this bug: my tables were created with "CREATE TABLE ... (...) CHARACTER SET UTF8", but my database was using the system default charset. I set all of the required variables to utf8 encoding, but doing a "LOAD FILE" would insert the utf8 text as if I'm reading it using a latin1 editor. The following procedure seems to fix the problem: - exported my database using a "SELECT * FROM .. INTO FILE". - dropped my database - "CREATE DATABASE my_db CHARACTER SET UTF8" - Did the "LOAD FILE", and everything was fine. In order to double-check that everything was OK, I dumped the table again in a new file to be sure that it was valid utf8. This is based on what Sinisa Milivojevic mentioned ealier in this thread, only that I'm not sure why it doesn't work for the others. Are all the variables set to utf8? (show variables like "character_set\_%")
[3 Nov 2005 11:15]
Martijn van Mourik
The following works for me on Windows MySql 4.1.8-nt and is far less drastic than dropping and rebuilding your database. Before executing the LOAD DATA INFILE-command set the following variable (assuming you want to import an UTF-8 character set file): SET character_set_database=utf8; After executing the LOAD DATA INFILE-command you can restore your previous setting by executing: SET character_set_database=default; Please note that for this solution to work all these commands should be executed in the same connection.
[14 Dec 2006 7:49]
J Tee
I have the same problem :( The following table: CREATE TABLE `tbl_x_testchars` ( `name` varchar(40) collate utf8_swedish_ci default NULL, `uname` varchar(40) collate utf8_swedish_ci default NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci; and when I use the "Load Data" feature Aasvoëlberg becomes Aasvo Import also does not work with different results. The Database, Table, and Field charset are all utf8.
[3 Mar 2007 6:35]
matthieu aubry
I would like to confirm that the LOAD DATA INFILE doesn't work for a UTF-8 encoded file, even when loaded into columns that have UTF8 collations. I fixed it by using the advice above : altering the character set for the whole database. Be careful, it can have some side effect on your other tables! ALTER DATABASE `mydbname` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci
[15 Jun 2007 17:21]
Thibaut Barrère
> I fixed it by using the advice above : altering the character set for the whole database. > Be careful, it can have some side effect on your other tables! > ALTER DATABASE `mydbname` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci Thanks for the tip ! works perfectly in my case.
[21 Mar 2008 0:28]
charles sheinin
I'm not sure if this is the same problem - if anyone is especially curious, I suppose I could add files later: 1) we begin with a utf8 csv file. 2) we put it on our data server and run the following in mysql: LOAD DATA INFILE 'filename.csv' INTO TABLE csv CHARACTER SET utf8 FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' (column1, column2); 3) data is inserted ok up until it reaches an accented, non-western character. It fails at that point, at that character, but all characters before it in that field are loaded ok. 4) I empty the table and try again, this time with '\r\n'. Everything goes in perfect, accents are showing ok and everything. But then I go to display that data on my website, and the accented characters display as ansi ?'s. This is in spite of every utf8 system I could have found in php, apache, mysql and in my html/xml document definitions. In fact, other utf8 data that I've inserted line by line with php reading from a file display fine on the same page, only this readfile data is messed up. also, the csv file was created using a program that generally only does '\n', not '\r\n' for its lines. also, default collation for all my tables in my database is utf8-unicode-ci, and my default character set is utf8. my workaround therefore is to simply write php scripts that insert the data line-by-line into the database.
[26 Jun 2008 23:38]
Gonzalo Lopez
I'm not sure how, but after trying a lot I managed to load UTF8 encoded files using LOAD DATA INFILE. Just indicate character set latin1 on the query. Note that database, tables and the text files are all UTF-8 encoded. LOAD DATA INFILE into table TABLE character set latin1 fields terminated by '|' It's hackish, but works for me. No idea why it does, though.
[29 Sep 2008 23:46]
Konstantin Osipov
Is it still broken? If it is not, it's important to see in which version this got fixed.
[8 Oct 2008 0:32]
Miguel Solorzano
I could not repeat this issue on 5.0.67 with my own test case. Could you please test latest version and if you still have the same issue provide the complete test case which failed on your side. Thanks in advance.
[8 Nov 2008 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[9 Jan 16:03]
Erwin Claassen
What is the status about this bug. Is this going to be solved quickly?
