Bug #66722 SELECT INTO OUTFILE .. encoding
Submitted: 6 Sep 2012 9:30 Modified: 2 Oct 2012 17:20
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.1+ OS:Any
Assigned to: Paul Dubois CPU Architecture:Any

[6 Sep 2012 9:30] Peter Laursen
Description:
SELECT INTO OUTFILE had added a CHARACTER SET option sometime in 5.1.  It is listed in the syntax description on top of http://dev.mysql.com/doc/refman/5.6/en/select.html but not elaborated later in the document.

How to repeat:
n/a

Suggested fix:
What I in particular find missing is a description of how the OUTFILE will be encoded if CHARACTER SET is not specifed.  Is it

character__set_server
or character set used for storage 
.. that is used?

I tried briefly with a 5.6 server having UTF16LE as character_set_server and a table with data stored as UTF8 ("CREATE TABLE `t` (`id` int(11) DEFAULT NULL,
`txt` varchar(10) CHARACTER SET utf8 DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf16le" - and the database is defined with "CREATE DATABASE `peter` /*!40100 DEFAULT CHARACTER SET latin1 */" - BTW :-))

The OUTFILE from my experiment was utf8-encoded. But there may be several details I am not aware of.

If it is 'character set used for storage' that is used as I think, what then if a SELECT INTO OUTFILE uses a JOIN on tables with different character sets?
[6 Sep 2012 9:30] Peter Laursen
Also see http://bugs.mysql.com/bug.php?id=66721
[6 Sep 2012 17:07] Sveta Smirnova
Thank you for the rpeort.

This feature described at http://dev.mysql.com/doc/refman/5.6/en/select-into.html:

Column values are written converted to the character set specified in the CHARACTER SET clause. If no such clause is present, values are dumped using the binary character set. In effect, there is no character set conversion. If a table contains columns in several character sets, the output data file will as well and you may not be able to reload the file correctly. 

Is there something wrong here?
[6 Sep 2012 18:00] Peter Laursen
OK .. accepted.  I do think however there should be a more direct reference from CHARACTER SET in the page I listed.  But this is a minor detail.

However this is incorrect: "If a table contains columns in several character sets,
 the output data file .. ".  It does not matter what columns a *table* contains.  What matters is what columns the *result set* contains (point: the result set could contain data from one table or more tables - or from user variables or any kind of expression not referencing stored data at all). This statement should be reframed to "If the result set  contains columns in several character sets,  the output data file .. "
[6 Sep 2012 18:03] Sveta Smirnova
Thank you for the feedback.

Makes sense, especially if stri9ng or cast functions were used.

Verified: " This statement should be reframed to "If the
result set  contains columns in several character sets,  the output
data file .. "
[2 Oct 2012 17:20] Paul Dubois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.