Bug #43288 problems with LOAD DATA, CHARSET (an maybe 'optionally enclosed')
Submitted: 1 Mar 2009 12:37 Modified: 14 Aug 2009 11:14
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.77, 5.1.31, 6.0,9, 5.1 bzr OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: qc

[1 Mar 2009 12:37] Peter Laursen
Description:
Simple LOAD DATA fails with unicode CHARSET parameter.

How to repeat:

Drop table if exists `chsettest`;

CREATE TABLE `chsettest` (                               
             `id` int(11) NOT NULL AUTO_INCREMENT,                  
             `txt` varchar(20) DEFAULT NULL,                        
             PRIMARY KEY (`id`)                                     
           ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
           
           
load data local infile 'C:\\Users\\PL\\Desktop\\csvtestansi.txt' into table `test`.`chsettest` character set 'latin1' fields escaped by '\\' terminated by ',' optionally enclosed by '"' lines terminated by '\r\n' (`id`, `txt`); -- no problems

truncate table chsettest;

load data local infile 'C:\\Users\\PL\\Desktop\\csvtestutf8.txt' into table `test`.`chsettest` character set 'utf8' fields escaped by '\\' terminated by ',' optionally enclosed by '"' lines terminated by '\r\n' (`id`, `txt`);

show warnings; 
-- Warning 1366  Incorrect integer value: '1' for column 'id' at row 1
/* but data were imported */

truncate table chsettest;

load data local infile 'C:\\Users\\PL\\Desktop\\csvtestucs2.txt' into table `test`.`chsettest` character set 'ucs2' fields escaped by '\\' terminated by ',' optionally enclosed by '"' lines terminated by '\r\n' (`id`, `txt`);

show warnings; 
-- Warning    1366  Incorrect integer value: '��1' for column 'id' at row 1
-- Warning    1261  Row 1 doesn't contain data for all columns  

Suggested fix:
Not sure.  I never observed before.  So this is either a bug introduced in all recent MySQL server or the 'optionally enclosed' specification is not handled properly and maybe never was.

I will attach 3 small files created by saving with Notepad on Windows (specifying ANSI/unicode/utf8 encoding when saving)
[1 Mar 2009 12:38] Peter Laursen
three sample files as a zipped archive

Attachment: csvtestfiles.zip (application/zip, text), 393 bytes.

[1 Mar 2009 12:39] Peter Laursen
Or maybe it is again BOMs causing problems?  This was fixed once before!
[1 Mar 2009 13:25] Peter Laursen
What I referred to as 'fixed before' is this:
http://bugs.mysql.com/bug.php?id=29323

(so this one was with SOURCE command in the command-line. I took for granted that all similar issues with boths servers and clients and files created on Windows having BOMs would be fixed then 'in one shot'.  But I am not sure of course if issue is the same this time!)
[23 Mar 2009 13:35] Susanne Ebrecht
Many thanks for reporting a bug report.

For load data infile of course your client encoding has to be set of the encoding that the file has.

I depends on how you stored the data.

This is not a bug.

We are already on the way to implement that you will have the possibility to tell which charset should be used for SELECT INTO OUTFILE. But it is not implemented yet and so always charset of filesystem is used.
[23 Mar 2009 13:45] Susanne Ebrecht
This is related to bug #30946
[23 Mar 2009 13:48] Peter Laursen
Susanne .. I doubt you understand!

the file had only the ANSI subset so it would work with both latin and utf8 Default Charset. I belive the problem is that BOM's are not handled (ignored).
[14 Aug 2009 11:14] Sveta Smirnova
Thank you for the reasonable feature request.