Bug #39247 LOAD DATA INFILE skips too many rows
Submitted: 4 Sep 2008 15:05 Modified: 15 Sep 2008 19:30
Reporter: d di (Basic Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:5.0.67, 5.0, 5.1, 6.0 bzr OS:Any (MS Windows 5.0.2195, Linux)
Assigned to: CPU Architecture:Any
Tags: csv, qc

[4 Sep 2008 15:05] d di
Description:
To begin with, please see the attached CSV file.
It contains 3 rows.

Now, create a table with matching schema:
  CREATE TABLE blue (f1 VARCHAR(255), f2 VARCHAR(255));

Import the CSV file:
  LOAD DATA INFILE 'c:/test.csv' INTO TABLE blue
  CHARACTER SET utf8
  FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"'
  LINES TERMINATED BY '\r\n'
  IGNORE 0 LINES

Please notice that 3 rows are imported.

Now, the first row is actually a header, so let's repeat the above, except change the last line to:
  IGNORE 1 LINES

Before you run the altered query, take one or two Valiums or <insert favorite horse sedative here> so the astonishment won't kill you.  Then run it and gaze at the awesome results..

ARGH! *Zero* rows imported!

OMG, etc.

Apparently MySQLd thinks that 3-1 equals 0.

How to repeat:
(see description)

Suggested fix:
Uhm.. probably something in the math library? :-)
[4 Sep 2008 15:06] d di
Microsoft CSV file

Attachment: default.htm (text/html), 1.25 KiB.

[4 Sep 2008 15:08] d di
retry upload with FF3 instead of OP9.5

Attachment: test.csv (application/vnd.ms-excel, text), 41 bytes.

[4 Sep 2008 15:09] Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newer version, 5.0.67, and inform about the results.
[4 Sep 2008 17:15] MySQL Verification Team
Could you please try without the clause ESCAPED BY '"'. Thanks in advance.
[4 Sep 2008 19:53] d di
Change the escape parameter as suggested, and the correct number of rows is imported.

Unfortunately, it also means that you no longer will be able to import CSV files:

Standard CSV files have field enclosers inside data escaped with a double repeat of the field encloser.

(The particular attached file works, but it is a minimal test case to demonstrate the bug.  A real CSV file would have text fields with a mix of spaces, double 'encloser chars' and 'separator chars' and would be badly corrupted on import when using the alteration of parameters suggested.)
[5 Sep 2008 6:15] Sveta Smirnova
Thank you for the feedback.

Verified as described. More likely feature request though.
[5 Sep 2008 8:10] d di
Feature request?
Sure seems like a bug to import 0 lines when the user asks for 2.
[15 Sep 2008 19:30] Sveta Smirnova
Test case for this report:

--exec echo '"Sprutte";"Fjabbe"' >$MYSQL_TEST_DIR/var/tmp/bug39247.csv
--exec echo '"1";"2"' >>$MYSQL_TEST_DIR/var/tmp/bug39247.csv
--exec echo '"3";"4"' >>$MYSQL_TEST_DIR/var/tmp/bug39247.csv

CREATE TABLE blue (f1 VARCHAR(255), f2 VARCHAR(255));

--eval  LOAD DATA INFILE '$MYSQL_TEST_DIR/var/tmp/bug39247.csv' INTO TABLE blue CHARACTER SET utf8 FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 0 LINES

select * from blue;
truncate table blue;

--eval  LOAD DATA INFILE '$MYSQL_TEST_DIR/var/tmp/bug39247.csv' INTO TABLE blue CHARACTER SET utf8 FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES

select * from blue;

Result:
CREATE TABLE blue (f1 VARCHAR(255), f2 VARCHAR(255));
LOAD DATA INFILE '/Users/apple/bzr/mysql-5.1/mysql-test/var/tmp/bug39247.csv' INTO TABLE blue CHARACTER SET utf8 FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 0 LINES;
select * from blue;
f1      f2
Sprutte Fjabbe
1.2     2.5
3.4     4,3
truncate table blue;
LOAD DATA INFILE '/Users/apple/bzr/mysql-5.1/mysql-test/var/tmp/bug39247.csv' INTO TABLE blue CHARACTER SET utf8 FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES;
select * from blue;
f1      f2
[23 Sep 2008 8:25] Omer Barnir
triage: this problem has been there for a very long time, as a result, the impact of the problem is lowered. Setting to SR60RC
[2 May 2011 17:30] Stephen Dewey
This should at least documented. It essentially means that the enclosing and escaping characters cannot be the same -- that should at least be mentioned in the MySQL manual, no?

It means you cannot use LOAD DATA INFILE with CSV files where enclosing quotes are doubled to escape them.
[9 Dec 2013 22:43] Jordan Mitchell
This is affecting me as well. We have lots of different kinds of CSV's being loaded on a regular basis, and not being able to ignore some lines when we need to specify an escape character is going to be painful. Regardless of how long it's been in here I'd argue that it's definitely a bug.
[2 Jun 2015 13:36] Jānis Elmeris
Still not solved in 5.6.

Interestingly, the import seems to be done successfully with `ESCAPED BY ''` even if the CSV field values actually contains quotes. It seems that a double `""` is correctly interpreted as a literal quote in the value.

Do not simply remove `ESCAPED BY ''` though, as then the default value ("\\") is being used and would (unexpectedly/incorrectly) convert all "\\" occurrences in the CSV values to "\".

--------------------------------

CREATE TABLE `LOAD_DATA` (
  `ID` int(11) NOT NULL,
  `TE` text,
  `dat` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

LOAD_DATA.csv:
"ID","TE","DAT"
"115979237","abc","2014-05-06"
"115979253","d,ef",""
"115979261","g""hi",""
"115979278","j"",kl","2013-01-07"
"115991551","oe\\pw",""

1) ESCAPED BY '"' IGNORE 1 LINES
The lines get ignored up and including the first line whose last value is empty (""):

mysql> TRUNCATE TABLE LOAD_DATA; LOAD DATA LOCAL INFILE "/home/janise/LOAD_DATA.csv"  INTO TABLE LOAD_DATA COLUMNS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES (ID, TE, @DAT) SET DAT = IF(@DAT = '', NULL, @DAT); SHOW WARNINGS; SELECT * FROM LOAD_DATA;
Query OK, 0 rows affected (0.38 sec)

Query OK, 3 rows affected (0.00 sec)
Records: 3  Deleted: 0  Skipped: 0  Warnings: 0

Empty set (0.00 sec)

+-----------+--------+---------------------+
| ID        | TE     | dat                 |
+-----------+--------+---------------------+
| 115979261 | g"hi   | NULL                |
| 115979278 | j",kl  | 2013-01-07 00:00:00 |
| 115991551 | oe\\pw | NULL                |
+-----------+--------+---------------------+
3 rows in set (0.00 sec)

2) ESCAPED BY '"' IGNORE 0 LINES
All lines correctly imported, but also the header row is included.

mysql> TRUNCATE TABLE LOAD_DATA; LOAD DATA LOCAL INFILE "/home/janise/LOAD_DATA.csv"  INTO TABLE LOAD_DATA COLUMNS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\n' IGNORE 0 LINES (ID, TE, @DAT) SET DAT = IF(@DAT = '', NULL, @DAT); SHOW WARNINGS; SELECT * FROM LOAD_DATA;
Query OK, 0 rows affected (0.38 sec)

Query OK, 6 rows affected, 2 warnings (0.01 sec)
Records: 6  Deleted: 0  Skipped: 0  Warnings: 2

+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
| Warning | 1265 | Data truncated for column 'dat' at row 1               |
+---------+------+--------------------------------------------------------+
2 rows in set (0.00 sec)

+-----------+--------+---------------------+
| ID        | TE     | dat                 |
+-----------+--------+---------------------+
|         0 | TE     | 0000-00-00 00:00:00 |
| 115979237 | abc    | 2014-05-06 00:00:00 |
| 115979253 | d,ef   | NULL                |
| 115979261 | g"hi   | NULL                |
| 115979278 | j",kl  | 2013-01-07 00:00:00 |
| 115991551 | oe\\pw | NULL                |
+-----------+--------+---------------------+
6 rows in set (0.00 sec)

3) IGNORE 1 LINES
All lines imported, but "\\" incorrectly stored as "\".

mysql> TRUNCATE TABLE LOAD_DATA; LOAD DATA LOCAL INFILE "/home/janise/LOAD_DATA.csv"  INTO TABLE LOAD_DATA COLUMNS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES (ID, TE, @DAT) SET DAT = IF(@DAT = '', NULL, @DAT); SHOW WARNINGS; SELECT * FROM LOAD_DATA;
Query OK, 0 rows affected (0.39 sec)

Query OK, 5 rows affected (0.00 sec)
Records: 5  Deleted: 0  Skipped: 0  Warnings: 0

Empty set (0.01 sec)

+-----------+-------+---------------------+
| ID        | TE    | dat                 |
+-----------+-------+---------------------+
| 115979237 | abc   | 2014-05-06 00:00:00 |
| 115979253 | d,ef  | NULL                |
| 115979261 | g"hi  | NULL                |
| 115979278 | j",kl | 2013-01-07 00:00:00 |
| 115991551 | oe\pw | NULL                |
+-----------+-------+---------------------+
5 rows in set (0.00 sec)

4) ESCAPED BY '' IGNORE 1 LINES
All lines correctly imported. However, this is still a workaround, as it doesn't match the documentation and who knows in what cases this approach actually fails.

mysql> TRUNCATE TABLE LOAD_DATA; LOAD DATA LOCAL INFILE "/home/janise/LOAD_DATA.csv"  INTO TABLE LOAD_DATA COLUMNS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '' LINES TERMINATED BY '\n' IGNORE 1 LINES (ID, TE, @DAT) SET DAT = IF(@DAT = '', NULL, @DAT); SHOW WARNINGS; SELECT * FROM LOAD_DATA;
Query OK, 0 rows affected (0.42 sec)

Query OK, 5 rows affected (0.00 sec)
Records: 5  Deleted: 0  Skipped: 0  Warnings: 0

Empty set (0.00 sec)

+-----------+--------+---------------------+
| ID        | TE     | dat                 |
+-----------+--------+---------------------+
| 115979237 | abc    | 2014-05-06 00:00:00 |
| 115979253 | d,ef   | NULL                |
| 115979261 | g"hi   | NULL                |
| 115979278 | j",kl  | 2013-01-07 00:00:00 |
| 115991551 | oe\\pw | NULL                |
+-----------+--------+---------------------+
5 rows in set (0.00 sec)