Bug #70076 CSV engine does not honour CHARSET specification
Submitted: 18 Aug 2013 18:59 Modified: 4 Jul 2014 16:53
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.6+ OS:Windows
Assigned to: CPU Architecture:Any

[18 Aug 2013 18:59] Peter Laursen
Description:
CSV engine does not honour CHARSET specification.

How to repeat:
CREATE TABLE `bla`.`win`( `txt` VARCHAR(10) NOT NULL ) ENGINE=CSV CHARSET=utf16le; 
INSERT INTO `bla`.`win` (`txt`) VALUES ('abc'); 
INSERT INTO `bla`.`win` (`txt`) VALUES ('xyz'); 
INSERT INTO `bla`.`win` (`txt`) VALUES ('æøå'); 

Next verify the encoding of the win.CSV file (by inspecting byte-length of the 3 rows or in a HEX editor).  The file is ANSI encoded (My LOCALE is Western (Danish)). Since all strings inserted can be represented in ANSI/Western I think that is how Windows want to create this file and the encoding is not passed to Windows and it file system no matter what CHARSET I specify.

Further I see both a Win.CSV and win.CSM file. It is not clear what the latter does.

Opened in Notepad the file simply reads 
"a b c "
"x y z "
"æ ø å "
(and with Unix-style linebreaks). Microsoft Excel does not accept this file as a valid CSV file.

Suggested fix:
Let CSV engine create files that are usable for spreadsheets and similar programson the actual platform. And let the files be encoded as specified in the CREATE TABLE statement (what is simple to do on Windows - excpet for utf32 charset probably). This is also relevant for log files. And I think it did in MySQL 5.1 when the CSV engine was introduced. I am almost sure that CSV tables/files would open in Excel on MySQL 5.1.

So this I also consider a regression.
[18 Aug 2013 19:43] Peter Laursen
Also see http://bugs.mysql.com/bug.php?id=70077
[19 Aug 2013 8:09] Peter Laursen
Actually I think the best category here is CSV engine and not charset. You can change as you want. And probably reproducible since 5.1.

Consider this: CREATE TABLE .. ENGINE CSV CHARSET big5;
..  I really doubt that any system capable of running MySQL can create a file with such encoding. 

And: CREATE TABLE .. ENGINE CSV CHARSET latin2; -- on a Western system makes no sense

So CHARSET probably makes little sense for CSV tables at all.  CHARSET (encoding) is handled by the underlying OS and filesystem. But still on Windows (at least) it should in principle be posible to specify 3 different unicode charsets (utf8, utf16 and utf16le) as Windows handles all 3.

But if I am right that the storage format for CSV tables was changed between 5.1 and 5.6, it raises the questions:
* when?
* why?
* have the usability of the CSV files outside a MySQL context decreased?
* where is it documented?
* how does it affect upgrading users having user data in CSV tables?
[4 Jun 2014 16:52] MySQL Verification Team
CSV

Attachment: 2_csv.png (image/png, text), 88.99 KiB.

[4 Jun 2014 16:53] MySQL Verification Team
Thank you for the bug report. I couldn't repeat with 5.6.19 running on Windows 7
Ultimate 64-bits English with locale Brazilian Portuguese. Please check with 5.6.19. Thanks.
[5 Jul 2014 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".