Bug #10573 | BOM(of UTF-8) is not correctly handled by LOAD DATA | ||
---|---|---|---|
Submitted: | 12 May 2005 8:08 | Modified: | 28 Jan 2014 21:38 |
Reporter: | Hartmut Holzgraefe | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: DML | Severity: | S2 (Serious) |
Version: | 4.1.11/5.6 | OS: | Linux (Linux) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[12 May 2005 8:08]
Hartmut Holzgraefe
[16 May 2005 17:53]
Alexander Barkov
I think this is not a bug. This is a feature request "add Byte Order Mark support into LOAD DATA INFILE".
[19 May 2005 16:34]
Sergei Golubchik
see also http://bugs.mysql.com/bug.php?id=4960
[2 Jun 2005 18:07]
Alexander Barkov
This is a feature request. We created a TODO item for implement this feature. It's currently scheduled for MySQL-5.2.
[25 Nov 2010 9:02]
Alexander Barkov
The original report in Bug#10573 demonstrates that LOAD DATA INFILE imports BOM as '?' mark. In bug#29323 Stanislav demonstrates another example, when BOM causes data *loss*: mysql> LOAD DATA LOCAL INFILE 'c:/Work/AgroImpex/stocks.txt' -> REPLACE -> INTO TABLE chlng.stocks_orig -> CHARACTER SET 'utf8' -> FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' -> LINES TERMINATED BY '\r\n' STARTING BY '' -> ; ERROR 1292 (22007): Incorrect date value: 'я╗┐14.10.2010' for column 'sdate' at row 1 So the value of the date column in the first row gets lost. This is a more serious flaw than extra '?' mark. I am reopening the report, so it can be fixed soon.
[25 Nov 2010 9:11]
Alexander Barkov
A possible workaround: - open the file in a text editor and insert a new line in the very beginning - then run LOAD DATA - then delete the first record from the table using a DELETE statement with appropriate WHERE condition.
[25 Nov 2010 9:18]
Alexander Barkov
Note, in strict mode the entire LOAD DATA INFILE interrupts on error, so no import happens at all.
[28 Jan 2014 21:38]
MySQL Verification Team
Still present and verified as Feature request. C:\dbs>net start mysqld56 The MySQLD56 service is starting. The MySQLD56 service was started successfully. C:\dbs>56 C:\dbs>c:\dbs\5.6\bin\mysql -uroot --port=3560 --prompt="mysql 5.6 > " Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.6.16-log Source distribution Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql 5.6 > CREATE DATABASE bom; Query OK, 1 row affected (0.00 sec) mysql 5.6 > USE bom Database changed mysql 5.6 > CREATE TABLE utf8_tbl (col1 VARCHAR(50)) CHARACTER SET utf8; Query OK, 0 rows affected (0.17 sec) mysql 5.6 > CREATE TABLE sjis_tbl (col1 VARCHAR(50)) CHARACTER SET sjis; Query OK, 0 rows affected (0.14 sec) mysql 5.6 > SET character_set_database=utf8; Query OK, 0 rows affected (0.00 sec) mysql 5.6 > LOAD DATA LOCAL INFILE "c:/tmp/utf8n.txt" INTO TABLE utf8_tbl LINES TERMINATED BY '\n'; Query OK, 2 rows affected (0.17 sec) Records: 2 Deleted: 0 Skipped: 0 Warnings: 0 mysql 5.6 > SELECT col1, HEX(col1) FROM utf8_tbl; +------+-----------+ | col1 | HEX(col1) | +------+-----------+ | abc | 616263 | | def | 646566 | +------+-----------+ 2 rows in set (0.00 sec) mysql 5.6 > TRUNCATE TABLE utf8_tbl; Query OK, 0 rows affected (0.27 sec) mysql 5.6 > SET character_set_database=utf8; Query OK, 0 rows affected (0.00 sec) mysql 5.6 > LOAD DATA LOCAL INFILE "c:/tmp/utf8.txt" INTO TABLE utf8_tbl LINES TERMINATED BY '\n'; Query OK, 2 rows affected (0.05 sec) Records: 2 Deleted: 0 Skipped: 0 Warnings: 0 mysql 5.6 > SELECT col1, HEX(col1) FROM utf8_tbl; +------+--------------+ | col1 | HEX(col1) | +------+--------------+ | ?abc | EFBBBF616263 | | def | 646566 | +------+--------------+ 2 rows in set (0.00 sec) mysql 5.6 > SET character_set_database=utf8; Query OK, 0 rows affected (0.01 sec) mysql 5.6 > LOAD DATA LOCAL INFILE "c:/tmp/utf8n.txt" INTO TABLE sjis_tbl LINES TERMINATED BY '\n'; Query OK, 2 rows affected (0.11 sec) Records: 2 Deleted: 0 Skipped: 0 Warnings: 0 mysql 5.6 > SELECT col1, HEX(col1) FROM sjis_tbl; +------+-----------+ | col1 | HEX(col1) | +------+-----------+ | abc | 616263 | | def | 646566 | +------+-----------+ 2 rows in set (0.00 sec) mysql 5.6 > TRUNCATE TABLE sjis_tbl; Query OK, 0 rows affected (0.19 sec) mysql 5.6 > SET character_set_database=utf8; Query OK, 0 rows affected (0.00 sec) mysql 5.6 > LOAD DATA LOCAL INFILE "c:/tmp/utf8.txt" INTO TABLE sjis_tbl LINES TERMINATED BY '\n'; Query OK, 2 rows affected, 1 warning (0.05 sec) Records: 2 Deleted: 0 Skipped: 0 Warnings: 1 mysql 5.6 > SELECT col1, HEX(col1) FROM sjis_tbl; +------+-----------+ | col1 | HEX(col1) | +------+-----------+ | ?abc | 3F616263 | | def | 646566 | +------+-----------+ 2 rows in set (0.00 sec) mysql 5.6 >
[13 Feb 2019 12:26]
muthupandi k
Any progress in this feature or any alternate way to handle BOM