Bug #71091 CSV engine does not properly process "", in quotes
Submitted: 5 Dec 2013 12:52 Modified: 6 Dec 2013 11:11
Reporter: Valeriy Kravchuk Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: CSV Severity:S3 (Non-critical)
Version:5.5.33, any, 5.6.15, 5.5.35 OS:Any
Assigned to: CPU Architecture:Any

[5 Dec 2013 12:52] Valeriy Kravchuk
Description:
According to http://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules_and_examples comma (,) in the value in CSV format must be quoted with " and quote (") in quoted value must be doubled:

"
    Fields with embedded commas must be quoted.

1997,Ford,E350,"Super, luxurious truck"

    Each of the embedded double-quote characters must be represented by a pair of double-quote characters.

1997,Ford,E350,"Super, ""luxurious"" truck"
" 

So, value like this:

"a"",b" in CSV file should be interpreted as a",b after loading. Check also http://en.wikipedia.org/wiki/Comma-separated_values#Example later on that page that shows this in one of examples.

MySQL's CSV engine does NOT interpret "", inside quotes properly. It requires sequence like \", to be there instead, without clear reason (I see nothing like this explained here, http://dev.mysql.com/doc/refman/5.6/en/csv-storage-engine.html).

How to repeat:
On any MySQL server version execute the following:

mysql> CREATE TABLE `emp` (
    -> `col1` text NOT NULL,
    -> `col2` text NOT NULL
    -> ) ENGINE=CSV DEFAULT CHARSET=utf8
    -> ;
Query OK, 0 rows affected (0.11 sec)

mysql> insert into emp values ("alan", "newyork"), ("jim", "CA\",boston");
Query OK, 2 rows affected (0.06 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from emp;
+------+------------+
| col1 | col2       |
+------+------------+
| alan | newyork    |
| jim  | CA",boston |
+------+------------+
2 rows in set (0.00 sec)

mysql> flush tables;
Query OK, 0 rows affected (0.05 sec)

Now, check what is in emp.CSV file in the database directory. You'll see:

"alan","newyork"
"jim","CA\",boston"

I do not see this documented on Wiki as acceptable way of quoting. Now, change file to conform to "standard":

"alan","newyork"
"jim","CA"",boston"

Then check how it is interpreted/loaded:

mysql> flush tables;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from emp;
+------+---------+
| col1 | col2    |
+------+---------+
| alan | newyork |
| jim  | CA"     |
+------+---------+
2 rows in set (0.02 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.5.33    |
+-----------+
1 row in set (0.01 sec)

Have fun with big CSV files created by non-MySQL software.

Suggested fix:
MySQL's CSV engine should follow CSV format standard (http://tools.ietf.org/html/rfc4180) in this case.
[5 Dec 2013 15:00] Peter Laursen
Further more a file reading according to "standard" loads with LOAD DATA INFILE
when specifying ENCLOSED BY '"' (but the non-standard" version also does).

In other words LOAD DATA accepts both

"alan","newyork"
"jim","CA\",boston"

and

"alan","newyork"
"jim","CA"",boston"

But SELECT INTO OUTFILE generates the "non-standard" thing

SELECT * FROM `csv` INTO OUTFILE 'file' FIELDS ENCLOSED BY '"' LINES TERMINATED BY '\r\n';

-- and file reads:

"alan"	"newyork"
"jim"	"CA\",boston"

Excel for instance will not read this file correctly.
[6 Dec 2013 11:11] Umesh Shastry
Hello Valeriy,

Thank you for the bug report and test case.
Verified as described.

Thanks,
Umesh