Bug #59523 | LOAD DATA (LOCAL) INFILE does not work for UTF8 | ||
---|---|---|---|
Submitted: | 15 Jan 2011 8:02 | Modified: | 1 Nov 2012 17:27 |
Reporter: | Ngan Pham | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
Version: | 5.0, 5.1 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | import, LOAD DATA INFILE, utf8 |
[15 Jan 2011 8:02]
Ngan Pham
[15 Jan 2011 9:28]
Peter Laursen
see http://dev.mysql.com/doc/refman/5.0/en/load-data.html note 1) LOAD DATA ... [CHARACTER SET charset_name] 2) The character set indicated by the character_set_database system variable is used to interpret the information in the file. SET NAMES and the setting of character_set_client do not affect interpretation of input. If the contents of the input file use a character set that differs from the default, it is usually preferable to specify the character set of the file by using the CHARACTER SET clause, which is available as of MySQL 5.0.38. A character set of binary specifies “no conversion.” Do you specify the "CHARACTER SET" option in the statement? Peter (not a MySQL person)
[15 Jan 2011 10:02]
Valeriy Kravchuk
Looks like we need more details about OS and MySQL server version used, exact LOAD DATA statement and data you try to load and then get from the database. This is what I see: macbook-pro:5.0 openxs$ cat /tmp/utf8.txt русский english אָלֶף ألف مقصورة) macbook-pro:5.0 openxs$ file /tmp/utf8.txt /tmp/utf8.txt: UTF-8 Unicode text macbook-pro:5.0 openxs$ bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.0.93-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show create table tu\G *************************** 1. row *************************** Table: tu Create Table: CREATE TABLE `tu` ( `c1` varchar(255) default NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8 1 row in set (0.00 sec) mysql> load data local infile '/tmp/utf8.txt' into table tu; Query OK, 4 rows affected (0.00 sec) Records: 4 Deleted: 0 Skipped: 0 Warnings: 0 mysql> select * from tu; +----------------------+ | c1 | +----------------------+ | русский | | english | | אָלֶף | | ألف مقصورة) | +----------------------+ 4 rows in set (0.00 sec) mysql> show session variables like 'char%'; +--------------------------+---------------------------------------------+ | Variable_name | Value | +--------------------------+---------------------------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /Users/openxs/dbs/5.0/share/mysql/charsets/ | +--------------------------+---------------------------------------------+ 8 rows in set (0.00 sec)
[15 Jan 2011 16:25]
Ngan Pham
server1:~ npham$ cat /tmp/utf8.txt русский english אָלֶף ألف مقصورة) server1:~ npham$ mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3463 Server version: 5.0.91-log MySQL Community Server (GPL) Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE DATABASE utf8_test CHARACTER SET utf8; Query OK, 1 row affected (0.01 sec) mysql> show create database utf8_test; +-----------+--------------------------------------------------------------------+ | Database | Create Database | +-----------+--------------------------------------------------------------------+ | utf8_test | CREATE DATABASE `utf8_test` /*!40100 DEFAULT CHARACTER SET utf8 */ | +-----------+--------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> use utf8_test; Database changed mysql> CREATE TABLE `tu` (`c1` varchar(255) default NULL) ENGINE=MyISAM DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.04 sec) mysql> SHOW CREATE TABLE `tu`; +-------+-------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-------------------------------------------------------------------------------------------+ | tu | CREATE TABLE `tu` ( `c1` varchar(255) default NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8 | +-------+-------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> load data local infile '/tmp/utf8.txt' into table tu; Query OK, 4 rows affected (0.00 sec) Records: 4 Deleted: 0 Skipped: 0 Warnings: 0 mysql> select * from tu; +-------------+ | c1 | +-------------+ | ??????? | | english | | ????? | | ??? ??????) | +-------------+ 4 rows in set (0.00 sec) mysql> show session variables like 'char%'; +--------------------------+----------------------------------------+ | Variable_name | Value | +--------------------------+----------------------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/local/mysql/share/mysql/charsets/ | +--------------------------+----------------------------------------+ 8 rows in set (0.01 sec)
[15 Jan 2011 16:26]
Ngan Pham
The key to duplicating this is creating a database with a default character set of utf8 (not latin1).
[15 Jan 2011 16:35]
Ngan Pham
Specifying the CHARACTER SET (correctly) in the query does not make a difference. mysql> truncate tu; Query OK, 0 rows affected (0.00 sec) mysql> load data local infile '/tmp/utf8.txt' into table tu character set utf8; Query OK, 4 rows affected (0.00 sec) Records: 4 Deleted: 0 Skipped: 0 Warnings: 0 mysql> select * from tu; +-------------+ | c1 | +-------------+ | ??????? | | english | | ????? | | ??? ??????) | +-------------+ 4 rows in set (0.00 sec) HOWEVER, strangely enough, specifying the (incorrect) CHARACTER SET as latin1, things seem to work: mysql> truncate tu; Query OK, 0 rows affected (0.00 sec) mysql> load data local infile '/tmp/utf8.txt' into table tu character set latin1; Query OK, 4 rows affected (0.00 sec) Records: 4 Deleted: 0 Skipped: 0 Warnings: 0 mysql> select * from tu; +----------------------+ | c1 | +----------------------+ | русский | | english | | אָלֶף | | ألف مقصورة) | +----------------------+ 4 rows in set (0.00 sec)
[17 Jan 2011 12:13]
Sveta Smirnova
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php You have to specify proper client character set when retrieve records from the table. See http://dev.mysql.com/doc/refman/5.1/en/set-option.html and http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html for details.
[17 Jan 2011 18:04]
Ngan Pham
Sveta, I'm not trying to retrieve records from the table. I'm trying to load data into the table. I don't see how this is not a bug? You can EASILY reproduce it from my comment above. Unless I'm doing something wrong--at which point it would be a lot more helpful if you can tell what it is I'm doing wrong instead of linking me to documentation and assuming that I haven't gone through it already.
[17 Jan 2011 18:54]
Sveta Smirnova
Thank you for the feedback. Problem is in comment "[15 Jan 17:25] Ngan Pham <nganpham@gmail.com>": mysql> select * from tu; +-------------+ | c1 | +-------------+ | ??????? | | english | | ????? | | ??? ??????) | +-------------+ 4 rows in set (0.00 sec) mysql> show session variables like 'char%'; +--------------------------+----------------------------------------+ | Variable_name | Value | +--------------------------+----------------------------------------+ | character_set_client | latin1 ... To understand how it affects this case, please, read manual links to which I already provided.
[21 Jan 2011 0:13]
Ngan Pham
Setting character_set_client and/or character_set_connection to utf8 still does not make it work.
[21 Jan 2011 17:31]
Sveta Smirnova
Thank you for the feedback. Please provide all steps with proper SET NAMES
[22 Feb 2011 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[16 Oct 2012 13:51]
Stefan Haubenthal
http://blog.datispars.com/loading-utf8-text-data-into-a-mysql-table/