Bug #58216 SELECT INTO OUTFILE doesn't respect the names
Submitted: 15 Nov 2010 23:57 Modified: 16 Nov 2010 23:39
Reporter: Emiliano Perez Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0.84-log OS:Linux (Gentoo)
Assigned to: CPU Architecture:Any
Tags: INTO, Latin1, names, outfile, SELECT, utf8

[15 Nov 2010 23:57] Emiliano Perez
Description:
I have a table with UTF-8 charset, but the content's are enconded with latin1 (I know, that sucks, but let's keep going). This is the table to dump, remember that it have latin1 contents despite what the charset might say:

CREATE TABLE `users` (
  `id` mediumint(8) unsigned NOT NULL auto_increment,
  `nick` varchar(35) NOT NULL,
  `rango` tinyint(3) unsigned default '0',
  `password` varchar(32) NOT NULL default '',
  `email` varchar(35) NOT NULL default '',
  `avatar` varchar(255) default NULL,
  `fecha_creado` datetime NOT NULL,
  `referidor` varchar(20) default NULL,
  `sitio` varchar(60) default NULL,
  `skin` varchar(14) NOT NULL default 'styles',
  `im` varchar(64) default NULL,
  `ts` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  `sexo` enum('m','f') NOT NULL,
  `nacionalidad` tinyint(3) unsigned NOT NULL,
  `provincia` tinyint(3) unsigned NOT NULL,
  `ciudad_id` int(10) unsigned NOT NULL,
  `ciudad` varchar(32) NOT NULL,
  `award_flag` tinyint(3) unsigned NOT NULL,
  `mensaje_personal` varchar(64) NOT NULL,
  `status` tinyint(3) unsigned NOT NULL,
  `ultimo_acceso` datetime NOT NULL,
  `fecha_nacimiento` date NOT NULL,
  `enviar_publicidad` tinyint(1) NOT NULL,
  `nombre` varchar(32) NOT NULL,
  `mostrar_estado` tinyint(3) unsigned NOT NULL default '2',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `nick` (`nick`),
  KEY `rango` (`rango`),
  KEY `email` (`email`(5))
) ENGINE=MyISAM AUTO_INCREMENT=5872108 DEFAULT CHARSET=utf8

How to repeat:
// at this point I'm using utf8 (default), selects show garbage on the special characters like ñ, á, and such.

SET NAMES latin1

// Let's get latin !

SELECT * FROM users;

// Oh yeah, all the special characters are fine. Let's save that in a file !

SELECT * FROM users INTO OUTFILE '/tmp/dump.csv' FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n';

// Ok, not that select made the .csv file (as it should) but filed it with all the utf8 garbage data that I didn't wanted, despite of my latin1 change.

SET NAMES utf8

// Back to normal, no latin1 csv generated :(

Suggested fix:
SELECT INTO OUTFILE should use the results using the collation set on the names (in this case latin1).
[16 Nov 2010 9:42] Peter Laursen
SELECT INTO OUTFILE does not involve the client at all.  SET NAMES has no effect - and cannot have.  Once MySQL add a LOCAL option (what I hope they do one day) to SELECT INTO OUTFILE it could be otherwise.

This double encoding stuff is a mess. You can repair it in 2 steps
1) changing all encoded string columns to binary string columns (varbinary, BLOB)
2) Changing back to encoded string columns matching what is actually stored (ie. columns should be latin1 in your case)

Peter
(not a MySQL person)
[16 Nov 2010 23:39] Sveta Smirnova
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

You use character sets in wrong way thus unexpected behavior is expected. Please read at http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html how to properly convert your data.