Bug #10195 LOAD DATA INFILE and UTF-8
Submitted: 27 Apr 2005 8:18 Modified: 17 Jan 2011 12:28
Reporter: Moncef BEN TAARIT Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.8-nt OS:Windows (windows)
Assigned to: CPU Architecture:Any
Tags: affects_connectors

[27 Apr 2005 8:18] Moncef BEN TAARIT
Description:
LOAD DATA  INFILE or  LOAD DATA LOCAL INFILE doesn't insert  correctly an utf-8 data files in an utf8 data base : It reads correctly but the insertion is done on ANSI charset.

How to repeat:
Use phpMyadmin
creation database with utf8 charset
create an utf8 csv data file 
insert the texte file
[27 Apr 2005 13:46] MySQL Verification Team
In 5.0 LOAD DATA syntax will be extended to include charset valid for the file to be loaded.

Right now, LOAD DATA utilises database charset.
[15 May 2005 17:35] Adrian Klingel
This is not fixed in 5.0.  The behavior is the same.  This is a very serious problem.

To duplicate:
1)  Create a database with utf8 character set

2)  Create a table with a single column

3)  Create a file that contains this value:  Présentoir de comptoir en forme de bocal, grands porte-clés, 24 pièces.

4)  Use "load data local infile" to load it.

5)  Select from the table

The value you see will be "Pr".  It truncates on the accented character.
[23 Sep 2005 21:39] Grant Echols
I see that this has been sitting idle for several months now. I'm curious to know if its scheduled to be fixed. We are facing some pretty stiff performance problems without it. If a fix is forthcoming we'll use it. If not we'll pursue other options, but we'd like to know before making this decision. This affects all diacritic names involved in our user database and others where international characters are used.
[28 Sep 2005 20:54] Mathieu Lutfy
I also ran into this bug: my tables were created with "CREATE TABLE ... (...) CHARACTER SET UTF8", but my database was using the system default charset. I set all of the required variables to utf8 encoding, but doing a "LOAD FILE" would insert the utf8 text as if I'm reading it using a latin1 editor.

The following procedure seems to fix the problem:

- exported my database using a "SELECT * FROM .. INTO FILE".
- dropped my database
- "CREATE DATABASE my_db CHARACTER SET UTF8"
- Did the "LOAD FILE", and everything was fine.

In order to double-check that everything was OK, I dumped the table again in a new file to be sure that it was valid utf8.

This is based on what Sinisa Milivojevic mentioned ealier in this thread, only that I'm not sure why it doesn't work for the others. Are all the variables set to utf8? (show variables like "character_set\_%")
[3 Nov 2005 10:15] Martijn van Mourik
The following works for me on Windows MySql 4.1.8-nt and is far less drastic than dropping and rebuilding your database. Before executing the LOAD DATA INFILE-command set the following variable (assuming you want to import an UTF-8 character set file):

SET character_set_database=utf8;

After executing the LOAD DATA INFILE-command you can restore your previous setting by executing:

SET character_set_database=default;

Please note that for this solution to work all these commands should be executed in the same connection.
[14 Dec 2006 6:49] J Tee
I have the same problem :(

The following table:

CREATE TABLE `tbl_x_testchars` (
`name` varchar(40) collate utf8_swedish_ci default NULL,
`uname` varchar(40) collate utf8_swedish_ci default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci;

and when I use the "Load Data" feature Aasvoëlberg becomes Aasvo

Import also does not work with different results.

The Database, Table, and Field charset are all utf8.
[3 Mar 2007 5:35] matthieu aubry
I would like to confirm that the LOAD DATA INFILE doesn't work for a UTF-8 encoded file, even when loaded into columns that have UTF8 collations.

I fixed it by using the advice above : altering the character set for the whole database. Be careful, it can have some side effect on your other tables!

ALTER DATABASE `mydbname` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci
[15 Jun 2007 15:21] Thibaut Barrère
> I fixed it by using the advice above : altering the character set for the whole database.
> Be careful, it can have some side effect on your other tables!

> ALTER DATABASE `mydbname` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci

Thanks for the tip ! works perfectly in my case.
[20 Mar 2008 23:28] charles sheinin
I'm not sure if this is the same problem - if anyone is especially curious, I suppose I could add files later:

1) we begin with a utf8 csv file.
2) we put it on our data server and run the following in mysql:
LOAD DATA INFILE 'filename.csv' INTO TABLE csv CHARACTER SET utf8 FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' (column1, column2);
3) data is inserted ok up until it reaches an accented, non-western character.  It fails at that point, at that character, but all characters before it in that field are loaded ok.
4) I empty the table and try again, this time with '\r\n'.  Everything goes in perfect, accents are showing ok and everything.  But then I go to display that data on my website, and the accented characters display as ansi ?'s.  This is in spite of every utf8 system I could have found in php, apache, mysql and in my html/xml document definitions.  In fact, other utf8 data that I've inserted line by line with php reading from a file display fine on the same page, only this readfile data is messed up.  also, the csv file was created using a program that generally only does '\n', not '\r\n' for its lines.

also, default collation for all my tables in my database is utf8-unicode-ci, and my default character set is utf8.

my workaround therefore is to simply write php scripts that insert the data line-by-line into the database.
[26 Jun 2008 21:38] Gonzalo Lopez
I'm not sure how, but after trying a lot I managed to load UTF8 encoded files using LOAD DATA INFILE.

Just indicate character set latin1 on the query. Note that database, tables and the text files are all UTF-8 encoded.

LOAD DATA INFILE into table TABLE
character set latin1
fields terminated by '|'

It's hackish, but works for me. No idea why it does, though.
[29 Sep 2008 21:46] Konstantin Osipov
Is it still broken?
If it is not, it's important to see in which version this got fixed.
[7 Oct 2008 22:32] MySQL Verification Team
I could not repeat this issue on 5.0.67 with my own test case. Could you please test latest version and if you still have the same issue provide the complete test case which failed on your side. Thanks in advance.
[8 Nov 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[9 Jan 2009 15:03] Erwin Claassen
What is the status about this bug. Is this going to be solved quickly?
[4 Jan 2010 4:46] Misha B
I test LOAD DATA INFILE on 2 databases:

1 have settings: 
    database charset - latin1_swedish_ci, 
    table charset - utf8_general_ci, 
    field charset - utf8_general_ci

2 have settings:
    database charset - utf8_unicode_ci, 
    table charset - utf8_general_ci, 
    field charset - utf8_general_ci

In result, in 1 DB load data work correctly, and all no latin symbols (for example Öltemperatursensor) will imported correctly.
in 2 DB data was imported no correctly, text was truncates on the accented character
[4 Jan 2010 5:00] Misha B
And in result I load data in utf8 database character set by this command

set session collation_database=latin1_swedish_ci;
set session character_set_database=latin1;

LOAD DATA INFILE ...

It work for me :)
[19 Jan 2010 14:09] Mustafa Sak
Want to tell my Solution :)

MYSQL version: 5.1.39-1-log
Database charset latin 1
table charset: utf-8 unicode
field charset: uft-8 unicode
file (csv) charset: utf-8

solution:

LOAD DATA  INFILE '/tmp/x.csv'
      INTO TABLE x
character set UTF8
    FIELDS
        TERMINATED BY '|^|';

regards Sak
[9 Feb 2010 15:20] Andreas E.
They call this "solution" already??

Well, if the database charset is required to be latin1 to be internally (re)converted to UTF8 so that things work all, I do consider this a bug!!

Mustafa, try this with _database_ charset = UTF8 (instead of latin1) and I'll make a sure bet it will fail and screw up your characters!
Incredibly enough this bug was opened back in 2005 but still our trusty development team insists on this being correct behavior.
[14 Jan 2011 1:15] Ngan Pham
This bug is still happening for me.  MySQL 5.0.91.  I don't think it was ever solved.  The only way I was able to get around this was by using Gonzalo's method (above) of specifying my charset as latin1 in the query:

LOAD DATA LOCAL INFILE 'somefile.infile' INTO TABLE some_table CHARACTER SET latin1

My database, is UTF-8, my table is UTF-8, and my file is UTF-8.  EVERYTHING IS UTF-8.

This is very strange.  Please fix this bug!
[14 Jan 2011 22:13] Ngan Pham
OK, i found out that the database's DEFAULT CHARACTER SET is the only thing that matters in regards to this bug.  If you have UTF-8 tables but have a DEFAULT CHARACTER SET utf8 for database, you will encounter the issues.  However, if you have UTF-8 table, but latin1 database default, then it will work.
[17 Jan 2011 12:28] Sveta Smirnova
Closed as "Can't repeat" since "CREATE TABLE ... (...) CHARACTER SET UTF8" is implemented.
[18 Jan 2011 13:24] Jeroen Baas
I don't understand why that is reason to close this bug. Tables created using CHARACTER SET utf8 still show this issue.
[4 Jun 2011 0:27] Ricardo M
This is still an issue on 5.1.50 on Linux. My table has declared DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ENGINE=MyISAM and I had to use as workaround when doing LOAD DATA character set latin1 otherwise the load complains

Trying to load a string like: Windows® 98 I was getting the following error:
Level   Code    Message
Warning 1366    Incorrect string value: '\\xAE 98 S...'

After used character set latin1 in LOAD DATA no more error.
[12 Jun 2012 11:04] juice jin
I solved the problem by adding "use DATABASE" before "load data [local] infile ...".
I think the problem is that mysql doesn't know which database we are going to use if we don't specify one, so it uses the default charset which proved to be latin1.
[14 Jun 2012 9:04] Alexander Barkov
Please use CHARACTER SET clause in the LOAD FILE command.
See here for details.
http://dev.mysql.com/doc/refman/5.5/en/load-data.html

It works fine for me:

mysql -uroot
mysql> show variables like 'char%';
+--------------------------+--------------------------------+
| Variable_name            | Value                          |
+--------------------------+--------------------------------+
| character_set_client     | utf8                           |
| character_set_connection | utf8                           |
| character_set_database   | latin1                         | <--- notice
latin1
| character_set_filesystem | binary                         |
| character_set_results    | utf8                           |
| character_set_server     | latin1                         |
| character_set_system     | utf8                           |
| character_sets_dir       | /opt/mysql-5.6/share/charsets/ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)

#
# Creating a file with some UTF8 data (Extended Latin and Cyrillic letters)
#

mysql> select _utf8'????????????????' as str into outfile
'/opt/mysql-5.6/data/test/test-utf8.txt';
Query OK, 1 row affected (0.00 sec)

#
# Verifying that the file contains good UTF8 codes for the above letters:
#
mysql> select hex(load_file('/opt/mysql-5.6/data/test/test-utf8.txt'));
+----------------------------------------------------------+
| hex(load_file('/opt/mysql-5.6/data/test/test-utf8.txt')) |
+----------------------------------------------------------+
| C380C381C382C383D091D092D093D0940A                       |
+----------------------------------------------------------+
1 row in set (0.00 sec)

#
# Creating a table in the database "test"
#
mysql> drop table if exists test.t1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> create table test.t1 (a varchar(20) character set utf8);
Query OK, 0 rows affected (0.11 sec)

#
# Loading data using CHARACTER SET clause:
#
mysql> load data infile '/opt/mysql-5.6/data/test/test-utf8.txt' INTO TABLE
test.t1 CHARACTER SET utf8;
Query OK, 1 row affected (0.03 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

#
# Verifying that we have the same correct utf8 codes.
#
mysql> select hex(a) from test.t1;
+----------------------------------+
| hex(a)                           |
+----------------------------------+
| C380C381C382C383D091D092D093D094 |
+----------------------------------+
1 row in set (0.00 sec)

Everything worked fine.

However, if you run LOAD DATA without CHARACTER SET clause, then character
set
is detected from character_set_database by default. This is a documented
behaviour,
see the above manual page.

This scripts demonstrates a result of wrong LOAD DATA usage,
notice the broken codes in the end:

mysql> delete from test.t1;
Query OK, 1 row affected (0.03 sec)

mysql> load data infile '/opt/mysql-5.6/data/test/test-utf8.txt' INTO TABLE
test.t1;
Query OK, 1 row affected (0.02 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select hex(a) from test.t1;
+-----------------------------------------------------------------------------
-+
| hex(a)
 |
+-----------------------------------------------------------------------------
-+
|
C383E282ACC383C281C383E2809AC383C692C390E28098C390E28099C390E2809CC390E2809D
|
+-----------------------------------------------------------------------------
-+
1 row in set (0.00 sec)