Bug #37610 character set choice for output file
Submitted: 24 Jun 2008 15:06 Modified: 26 Jun 2008 12:11
Reporter: Susanne Ebrecht Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0, 5.1, 6.0 OS:Any
Assigned to: CPU Architecture:Any
Triage: D2 (Serious)

[24 Jun 2008 15:06] Susanne Ebrecht
Description:
SELECT column FROM table INTO OUTFILE 'outfile';

$ file outfile

File encoding result:
DB: latin1, column: utf8 -> output file has utf8
DB: latin1, column: latin1 -> output file has ISO-8859 (on Unix)
DB: utf8, column: utf8 -> output file has utf8
DB: utf8, column: latin1 -> output file has latin1

Doesn't matter which encoding your filesystem is using.

It would be nice to get a choice here for output file encoding.

Something like:

SELECT column FROM table INTO OUTFILE 'outfile' character set latin1;

Of course I know, the character set has to match. It's only possible to convert utf8 into latin1 if there are no signs stored in the column that are defined in utf8 and not in latin1.

But a warning or in strict mode an error would be help here.

How to repeat:
It's a feature request.

Suggested fix:
...
[24 Jun 2008 15:12] Susanne Ebrecht
That the file encoding depends on the column character set also is confusing.

A user expect that the files will be encoded to the encoding of the file system.

Means, when I configured my operating system that it should use UTF8 then I always expect UTF8 files as outfile. 

If I configured my operating system as ISO system then I expect ISO files.

Also and last but not least as user I want to have the choice.
[25 Jun 2008 10:28] Susanne Ebrecht
We discussed this problem again.

In our eyes this is not only a feature request.

This is a "wrong result" bug.

Usually, character_set_results should work here.

Example for CLI:
Use MySQL CLI, set your terminal to utf8:

SET NAMES UTF8;
CREATE DATABASE bug37610 CHARACTER SET UTF8;
USE bug37610
CREATE TABLE bug (id serial, t varchar(100));
INSERT INTO bug(t) VALUES ('ä');
SELECT LENGTH(t) FROM bug;

Result: 2

SELECT * FROM bug;
Result: ä

SET NAMES LATIN1;

SELECT LENGTH(t) FROM bug;

Result: 2

SELECT * FROM bug;

Result: ?

Switch terminal to ISO-8859-15:

SELECT * FROM bug;

Result: ä

As you can see, it works in CLI.

SET NAMES will change values of character_set_result and with this information the result will be converted into the character set that is wished for the result.

SELECT * FROM bug INTO OUTFILE 'bug37610test';

$ file var/bug37610/bug37610test 
var/bug37610/bug37610test: UTF-8 Unicode text

Terminal is still ISO:
$ less var/bug37610/bug37610test 
1       À

Switch terminal back to UTF8:
$ less var/bug37610/bug37610test 
1       ä

So the command file told the truth it is encoded in UTF8.

The set of character_set_result was ignored here.

Another test:
$ ./bin/mysql -e "set names latin1; select * from bug;" bug37610 > bug37610test2
$ file bug37610test2
bug37620test2: ISO-8859 text

It should doesn't matter if you will get the result on I/O or in CLI or into file. Also it should doesn't matter how you will get the result into the file the character_set_result should work in any case.

I mean it should not make a difference here if you say: SELECT ... INTO OUTFILE or make a shell redirection.
[26 Jun 2008 12:11] Sergei Golubchik
duplicate of Bug#30946