Bug #949 trailing white space at end of line defeats ENCLOSED BY during LOAD
Submitted: 30 Jul 2003 0:11 Modified: 16 Sep 2003 19:35
Reporter: [ name withheld ] Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:4.0.14 OS:Linux (Linux Mandrake 8.2)
Assigned to: Paul DuBois CPU Architecture:Any

[30 Jul 2003 0:11] [ name withheld ]
Description:
During a LOAD LOCAL, white space after a closing quote in the last field of a line causes the quote to be picked up as part of the field (rather than terminating it) and further causes the line-ending to be missed as well, pulling the next line of the file in as if it were a continuation of the previous line.  Note this happens with trailing spaces, or \r's, and possibly with non-white space as well though I haven't tested it.

How to repeat:
Create a file "bug" containing the following three lines (between the cut marks).  Note there is a single space after "Field 2" before the end of line:
-----------------------------
Field A,"Field B"
Field 1,"Field 2" 
Field 3,"Field 4"
----------------------------

mysql> create table bug (a text, b text);
Query OK, 0 rows affected (0.00 sec)

mysql> load data local infile 'bug' into table bug
-> fields terminated by ','
-> enclosed by '"';
Query OK, 2 rows affected (0.00 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from bug;
+---------+----------------------------+
| a       | b                          |   
+---------+----------------------------+
| Field A | Field B                    |   
| Field 1 | Field 2"  
Field 3,"Field 4 | 
+---------+----------------------------+
2 rows in set (0.00 sec)

Suggested fix:
(I'm not sure if this belongs int the server or client category since I don't know who's doing the parsing but I assume the server is to minimize redundancy?)
[20 Aug 2003 7:50] Sergei Golubchik
It is intentional feature.
ENCLOSED BY character is honored only if followed by TERMINATED BY character
(either line of field TERMINATED BY, that is)

It is done to import lines like

'Arnold's','pizzeria'

Manual, actually, contains a hint:
===
  Observe that the second field in the fourth line contains
  a comma following the quote, which (erroneously) appears
  to terminate the field:

       1,"a string",100.20
       2,"a string containing a , comma",102.20
       3,"a string containing a " quote",102.20
       4,"a string containing a ", quote and comma",102.20
===

But, indeed, it should be documented explicitly.
[20 Aug 2003 8:12] Michael Widenius
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

To not loose data and allow one to read in a bit malformed (but
correct data) we have defined in MySQL that a field separator is
the field-separator, if defined, followed by the field-terminator or
line terminator.

This makes it more reliable to read in lines like:

'Arnold's','pizzeria'
Arnold's,pizzeria

As it's not uncommon to get data like the above I think it's better to
keep the current behaviour for LOAD DATA.

After all, to accidently put a space after a " is something the a
human is likely to do while auto-generated files are much less likely
to have this kind of problems.  I think it's better to optimize 
OAD DATA to be able to handle auto-generated files good...

Note that it's allowed to have line terminators (normally newlines) in
a quoted field, to make it possible to read in things like the
following:

'Field with
newline'

Which explain the behavior you described in your bug report

Regards,
Monty
[20 Aug 2003 8:26] Sergei Golubchik
It is intentional feature.
ENCLOSED BY character is honored only if followed by TERMINATED BY character
(either line of field TERMINATED BY, that is)

It is done to import lines like

'Arnold's','pizzeria'

Manual, actually, contains a hint:
===
  Observe that the second field in the fourth line contains
  a comma following the quote, which (erroneously) appears
  to terminate the field:

       1,"a string",100.20
       2,"a string containing a , comma",102.20
       3,"a string containing a " quote",102.20
       4,"a string containing a ", quote and comma",102.20
===

But, indeed, it should be documented explicitly.
[16 Sep 2003 19:35] Paul DuBois
Thank you for your bug report. This issue has been addressed in the
documentation. The updated documentation will appear on our website
shortly, and will be included in the next release of the relevant
product(s).

I've updated the manual to point out more explicitly
that an ENCLOSED BY character is recognized as
terminating a field only if followed by a field or
line TERMINATED BY sequence.