Bug #66721 SELECT INTO OUTFILE inconsistent with BOMs
Submitted: 6 Sep 2012 9:15 Modified: 12 Sep 2012 19:32
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.6.6-m9-log OS:Microsoft Windows
Assigned to: CPU Architecture:Any

[6 Sep 2012 9:15] Peter Laursen
I worte the Blogs:

.. but I was mistaken in one point.  SELECT INTO OUTFILE actually has a CHARACTER SET option.  It was added sometime during 5.1.

But Sometimes BOMs are added and sometimes not. 

SELECT * FROM t INTO OUTFILE 'c:\\utf16le.csv' CHARACTER SET utf16le; -- does not write BOMs
SELECT * FROM t INTO OUTFILE 'c:\\utf8.csv' CHARACTER SET utf8; -- writes BOMs

The server is running with UTF16LE as character_set_server if it matters.

(HEX editor screenshots will be uploaded)

How to repeat:
See above

Suggested fix:
As I wrote *optionality of BOMs* would be nice.  But as long as not the case all Unicode encoding should be consistent in this respect.
[6 Sep 2012 9:16] Peter Laursen
UTF8-file has BOMs

Attachment: utf8.jpg (image/jpeg, text), 15.37 KiB.

[6 Sep 2012 9:17] Peter Laursen
utf16le-file does not have BOMs

Attachment: utf16le.jpg (image/jpeg, text), 11.21 KiB.

[6 Sep 2012 9:31] Peter Laursen
Also see http://bugs.mysql.com/bug.php?id=66722
[6 Sep 2012 11:03] Peter Laursen
I missed the links to my blogs I referred:

(but they are not required for discussion here - anyway adding for completeness)
[6 Sep 2012 18:49] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior: no BOM in both cases. Please provide dump of table t or, if it is MyISAM, binary table files, so we can try to repeat it on our side.
[6 Sep 2012 19:07] Peter Laursen
OK .. point accepted.  It could be the HEX-editor that added the BOMs (but weird then that it did only with teh utf8 file).

Did you try on Windows?

Anyway I will generate the files again (tomorrow probably).
[6 Sep 2012 19:10] Sveta Smirnova
Thank you for the feedback.

Not yet tested on Windows: I want to be sure I am doing exactly same actions like you before reboot.
[6 Sep 2012 19:13] Peter Laursen
I will post files tomorrow or in the weekend.
[8 Sep 2012 10:47] Peter Laursen
archive with files (explanation will follow)

Attachment: table_t.zip (application/zip, text), 731 bytes.

[8 Sep 2012 10:50] Peter Laursen
Archive contains 

1) SQL-dump of table (utf8-encoded - no BOMs)

2) OUTFILES generated with statements:
[8 Sep 2012 10:57] Peter Laursen
And for completeness: 

Variable_name             VALUE                                                    
------------------------  ---------------------------------------------------------
character_set_client      utf8                                                     
character_set_connection  utf8                                                     
character_set_database    latin1                                                   
character_set_filesystem  BINARY                                                   
character_set_results     utf8                                                     
character_set_server      utf16le                                                  
character_set_system      utf8                                                     
character_sets_dir        C:\Program Files\MySQL\MySQL SERVER 5.6\SHARE\charsets\
[12 Sep 2012 19:15] Sveta Smirnova
Thank you for the files.

Problem is hexdump does not display BOM in them:

[sveta@delly bug66721]$ hexdump -C uft8.csv 
00000000  31 09 c3 a6 0a                                    |1....|
[sveta@delly bug66721]$ hexdump -C uft16le.csv 
00000000  31 00 09 e6 00 0a                                 |1.....|

I also tried in Vim and JEdit Hexedit plugin. Please try different hex view utility on your OS too and infrom us if you are sure uft8.csv has BOM.
[12 Sep 2012 19:32] Peter Laursen
OK .. conclusion is that the HEX editor I used adds them.  Sorry for the noise.