Bug #66537 BOM problem
Submitted: 25 Aug 2012 14:33 Modified: 25 Aug 2012 15:13
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:5.0.96. 5.1.63, 5.5.23, 5.6.5 OS:Any
Assigned to: CPU Architecture:Any
Tags: qc
Triage: Needs Triage: D2 (Serious)

[25 Aug 2012 14:33] Peter Laursen
Description:
Either this bugfix http://bugs.mysql.com/bug.php?id=29323 was not complete or a reggression sneaked in later.

I have two CSV-files.  Both contain only ASCII.  One has BOMs (Byte Order Marks) identifying the file as a utf8-encoded file - the other has not.  Except for that they are identical. Data for the first row truncate with the file with BOMs - not the one without.

The files simply read like this:

"1","a"
"2","b"
"3","c"

How to repeat:
CREATE DATABASE peter;
USE peter;

CREATE TABLE `u8` (
  `id` INT(11) DEFAULT NULL,
  `txt` VARCHAR(10)
) ENGINE=INNODB DEFAULT CHARSET=utf8;

-- file with no BOMs
LOAD DATA LOCAL INFILE 'C:\\Users\\Peter\\Desktop\\file1.csv' INTO TABLE `peter`.`u8` CHARACTER SET 'utf8' FIELDS ESCAPED BY '\\' TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' (`id`, `txt`); 

SHOW WARNINGS; -- empty set

-- file with BOMs
LOAD DATA LOCAL INFILE 'C:\\Users\\Peter\\Desktop\\file2.csv' INTO TABLE `peter`.`u8` CHARACTER SET 'utf8' FIELDS ESCAPED BY '\\' TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' (`id`, `txt`); 

SHOW WARNINGS;
/*
Level      Code  Message                                                     
-------  ------  ------------------------------------------------------------
Warning    1366  Incorrect integer value: '"1"' for column 'id' at row 1  
*/

SELECT * FROM peter.u8;
/*
    id  txt     
------  --------
     1  a       
     2  b       
     3  c       
     0  a       
     2  b       
     3  c      
*/     

(files will be uploaded)

Suggested fix:
Not sure what the fix is!
[25 Aug 2012 14:33] Peter Laursen
one file

Attachment: file1.csv (text/comma-separated-values), 27 bytes.

[25 Aug 2012 14:33] Peter Laursen
and the other

Attachment: file2.csv (text/comma-separated-values), 30 bytes.

[25 Aug 2012 15:13] Valeriy Kravchuk
Thank you for the problem report. Verified with 5.5.28 on Mac OS X:

macbook-pro:5.5 openxs$ cat ~/Downloads/file2.csv 
"1","a"
"2","b"
"3","c"
macbook-pro:5.5 openxs$ hexdump ~/Downloads/file2.csv 
0000000 ef bb bf 22 31 22 2c 22 61 22 0d 0a 22 32 22 2c
0000010 22 62 22 0d 0a 22 33 22 2c 22 63 22 0d 0a      
000001e
macbook-pro:5.5 openxs$ hexdump ~/Downloads/file1.csv 
0000000 22 31 22 2c 22 61 22 0d 0a 22 32 22 2c 22 62 22
0000010 0d 0a 22 33 22 2c 22 63 22 0d 0a               
000001b
macbook-pro:5.5 openxs$ cat ~/Downloads/file1.csv 
"1","a"
"2","b"
"3","c"

macbook-pro:5.5 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.28-debug Source distribution

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE `u8` (
    ->   `id` INT(11) DEFAULT NULL,
    ->   `txt` VARCHAR(10)
    -> ) ENGINE=INNODB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.07 sec)

mysql> LOAD DATA INFILE '~/Downloads/file2.csv' INTO TABLE `u8` CHARACTER SET 'utf8' FIELDS ESCAPED BY '\\' TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' (`id`, `txt`); 
Query OK, 3 rows affected, 1 warning (0.02 sec)
Records: 3  Deleted: 0  Skipped: 0  Warnings: 1

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1366
Message: Incorrect integer value: '"1"' for column 'id' at row 1
1 row in set (0.00 sec)

mysql> select * from u8;
+------+------+
| id   | txt  |
+------+------+
|    0 | a    |
|    2 | b    |
|    3 | c    |
+------+------+
3 rows in set (0.00 sec)