Bug #39247 | LOAD DATA INFILE skips too many rows | ||
---|---|---|---|
Submitted: | 4 Sep 2008 15:05 | Modified: | 15 Sep 2008 19:30 |
Reporter: | d di (Basic Quality Contributor) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: DML | Severity: | S3 (Non-critical) |
Version: | 5.0.67, 5.0, 5.1, 6.0 bzr | OS: | Any (MS Windows 5.0.2195, Linux) |
Assigned to: | CPU Architecture: | Any | |
Tags: | csv, qc |
[4 Sep 2008 15:05]
d di
[4 Sep 2008 15:06]
d di
Microsoft CSV file
Attachment: default.htm (text/html), 1.25 KiB.
[4 Sep 2008 15:08]
d di
retry upload with FF3 instead of OP9.5
Attachment: test.csv (application/vnd.ms-excel, text), 41 bytes.
[4 Sep 2008 15:09]
Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newer version, 5.0.67, and inform about the results.
[4 Sep 2008 17:15]
MySQL Verification Team
Could you please try without the clause ESCAPED BY '"'. Thanks in advance.
[4 Sep 2008 19:53]
d di
Change the escape parameter as suggested, and the correct number of rows is imported. Unfortunately, it also means that you no longer will be able to import CSV files: Standard CSV files have field enclosers inside data escaped with a double repeat of the field encloser. (The particular attached file works, but it is a minimal test case to demonstrate the bug. A real CSV file would have text fields with a mix of spaces, double 'encloser chars' and 'separator chars' and would be badly corrupted on import when using the alteration of parameters suggested.)
[5 Sep 2008 6:15]
Sveta Smirnova
Thank you for the feedback. Verified as described. More likely feature request though.
[5 Sep 2008 8:10]
d di
Feature request? Sure seems like a bug to import 0 lines when the user asks for 2.
[15 Sep 2008 19:30]
Sveta Smirnova
Test case for this report: --exec echo '"Sprutte";"Fjabbe"' >$MYSQL_TEST_DIR/var/tmp/bug39247.csv --exec echo '"1";"2"' >>$MYSQL_TEST_DIR/var/tmp/bug39247.csv --exec echo '"3";"4"' >>$MYSQL_TEST_DIR/var/tmp/bug39247.csv CREATE TABLE blue (f1 VARCHAR(255), f2 VARCHAR(255)); --eval LOAD DATA INFILE '$MYSQL_TEST_DIR/var/tmp/bug39247.csv' INTO TABLE blue CHARACTER SET utf8 FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 0 LINES select * from blue; truncate table blue; --eval LOAD DATA INFILE '$MYSQL_TEST_DIR/var/tmp/bug39247.csv' INTO TABLE blue CHARACTER SET utf8 FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES select * from blue; Result: CREATE TABLE blue (f1 VARCHAR(255), f2 VARCHAR(255)); LOAD DATA INFILE '/Users/apple/bzr/mysql-5.1/mysql-test/var/tmp/bug39247.csv' INTO TABLE blue CHARACTER SET utf8 FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 0 LINES; select * from blue; f1 f2 Sprutte Fjabbe 1.2 2.5 3.4 4,3 truncate table blue; LOAD DATA INFILE '/Users/apple/bzr/mysql-5.1/mysql-test/var/tmp/bug39247.csv' INTO TABLE blue CHARACTER SET utf8 FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES; select * from blue; f1 f2
[23 Sep 2008 8:25]
Omer Barnir
triage: this problem has been there for a very long time, as a result, the impact of the problem is lowered. Setting to SR60RC
[2 May 2011 17:30]
Stephen Dewey
This should at least documented. It essentially means that the enclosing and escaping characters cannot be the same -- that should at least be mentioned in the MySQL manual, no? It means you cannot use LOAD DATA INFILE with CSV files where enclosing quotes are doubled to escape them.
[9 Dec 2013 22:43]
Jordan Mitchell
This is affecting me as well. We have lots of different kinds of CSV's being loaded on a regular basis, and not being able to ignore some lines when we need to specify an escape character is going to be painful. Regardless of how long it's been in here I'd argue that it's definitely a bug.
[2 Jun 2015 13:36]
Jānis Elmeris
Still not solved in 5.6. Interestingly, the import seems to be done successfully with `ESCAPED BY ''` even if the CSV field values actually contains quotes. It seems that a double `""` is correctly interpreted as a literal quote in the value. Do not simply remove `ESCAPED BY ''` though, as then the default value ("\\") is being used and would (unexpectedly/incorrectly) convert all "\\" occurrences in the CSV values to "\". -------------------------------- CREATE TABLE `LOAD_DATA` ( `ID` int(11) NOT NULL, `TE` text, `dat` timestamp NULL DEFAULT NULL, PRIMARY KEY (`ID`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 LOAD_DATA.csv: "ID","TE","DAT" "115979237","abc","2014-05-06" "115979253","d,ef","" "115979261","g""hi","" "115979278","j"",kl","2013-01-07" "115991551","oe\\pw","" 1) ESCAPED BY '"' IGNORE 1 LINES The lines get ignored up and including the first line whose last value is empty (""): mysql> TRUNCATE TABLE LOAD_DATA; LOAD DATA LOCAL INFILE "/home/janise/LOAD_DATA.csv" INTO TABLE LOAD_DATA COLUMNS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES (ID, TE, @DAT) SET DAT = IF(@DAT = '', NULL, @DAT); SHOW WARNINGS; SELECT * FROM LOAD_DATA; Query OK, 0 rows affected (0.38 sec) Query OK, 3 rows affected (0.00 sec) Records: 3 Deleted: 0 Skipped: 0 Warnings: 0 Empty set (0.00 sec) +-----------+--------+---------------------+ | ID | TE | dat | +-----------+--------+---------------------+ | 115979261 | g"hi | NULL | | 115979278 | j",kl | 2013-01-07 00:00:00 | | 115991551 | oe\\pw | NULL | +-----------+--------+---------------------+ 3 rows in set (0.00 sec) 2) ESCAPED BY '"' IGNORE 0 LINES All lines correctly imported, but also the header row is included. mysql> TRUNCATE TABLE LOAD_DATA; LOAD DATA LOCAL INFILE "/home/janise/LOAD_DATA.csv" INTO TABLE LOAD_DATA COLUMNS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 0 LINES (ID, TE, @DAT) SET DAT = IF(@DAT = '', NULL, @DAT); SHOW WARNINGS; SELECT * FROM LOAD_DATA; Query OK, 0 rows affected (0.38 sec) Query OK, 6 rows affected, 2 warnings (0.01 sec) Records: 6 Deleted: 0 Skipped: 0 Warnings: 2 +---------+------+--------------------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------------------+ | Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 | | Warning | 1265 | Data truncated for column 'dat' at row 1 | +---------+------+--------------------------------------------------------+ 2 rows in set (0.00 sec) +-----------+--------+---------------------+ | ID | TE | dat | +-----------+--------+---------------------+ | 0 | TE | 0000-00-00 00:00:00 | | 115979237 | abc | 2014-05-06 00:00:00 | | 115979253 | d,ef | NULL | | 115979261 | g"hi | NULL | | 115979278 | j",kl | 2013-01-07 00:00:00 | | 115991551 | oe\\pw | NULL | +-----------+--------+---------------------+ 6 rows in set (0.00 sec) 3) IGNORE 1 LINES All lines imported, but "\\" incorrectly stored as "\". mysql> TRUNCATE TABLE LOAD_DATA; LOAD DATA LOCAL INFILE "/home/janise/LOAD_DATA.csv" INTO TABLE LOAD_DATA COLUMNS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES (ID, TE, @DAT) SET DAT = IF(@DAT = '', NULL, @DAT); SHOW WARNINGS; SELECT * FROM LOAD_DATA; Query OK, 0 rows affected (0.39 sec) Query OK, 5 rows affected (0.00 sec) Records: 5 Deleted: 0 Skipped: 0 Warnings: 0 Empty set (0.01 sec) +-----------+-------+---------------------+ | ID | TE | dat | +-----------+-------+---------------------+ | 115979237 | abc | 2014-05-06 00:00:00 | | 115979253 | d,ef | NULL | | 115979261 | g"hi | NULL | | 115979278 | j",kl | 2013-01-07 00:00:00 | | 115991551 | oe\pw | NULL | +-----------+-------+---------------------+ 5 rows in set (0.00 sec) 4) ESCAPED BY '' IGNORE 1 LINES All lines correctly imported. However, this is still a workaround, as it doesn't match the documentation and who knows in what cases this approach actually fails. mysql> TRUNCATE TABLE LOAD_DATA; LOAD DATA LOCAL INFILE "/home/janise/LOAD_DATA.csv" INTO TABLE LOAD_DATA COLUMNS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '' LINES TERMINATED BY '\n' IGNORE 1 LINES (ID, TE, @DAT) SET DAT = IF(@DAT = '', NULL, @DAT); SHOW WARNINGS; SELECT * FROM LOAD_DATA; Query OK, 0 rows affected (0.42 sec) Query OK, 5 rows affected (0.00 sec) Records: 5 Deleted: 0 Skipped: 0 Warnings: 0 Empty set (0.00 sec) +-----------+--------+---------------------+ | ID | TE | dat | +-----------+--------+---------------------+ | 115979237 | abc | 2014-05-06 00:00:00 | | 115979253 | d,ef | NULL | | 115979261 | g"hi | NULL | | 115979278 | j",kl | 2013-01-07 00:00:00 | | 115991551 | oe\\pw | NULL | +-----------+--------+---------------------+ 5 rows in set (0.00 sec)