Description:
Hi,
I'm experiencing a charset problem on accentuated filenames when using
'LOAD DATA INFILE' or 'SELECT INTO OUTFILE' on Mysql 4.1.12.
Here's my configuration:
> SELECT VERSION();
+---------------------+
| version() |
+---------------------+
| 4.1.11-Debian_4-log |
+---------------------+
1 row in set (0.00 sec)
> SHOW VARIABLES LIKE 'character_%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
7 rows in set (0.00 sec)
When using accents in the filename, i get an error like:
> LOAD DATA LOCAL INFILE 'Période.csv' ...
ERROR 2 (HY000): File 'Période.csv' not found (Errcode: 2)
If I rename the original file from 'Période.csv' to 'Période.csv', everything's fine! It seems that the filename is converted into utf8 by mysqld and not converted back into something like "charset_set_connection" when sent back to client. Same problem happens with mysqld 5.0.6b on W2K (but worked fined in 4.0.x).
The same thing happens when using 'SELECT INTO OUTFILE'. For example:
> SELECT * INTO OUTFILE 'Période.csv' FROM ...
% ls
Période.csv
Now, when i change character set to utf8, mysql behaves has expected (really don't understand why ?!?):
> SET character_set_client=utf8;
> SELECT * INTO OUTFILE 'Période.csv' FROM ...
% ls
Période.csv
In any above case, the content of the file is correctly interpreted:
% cat Période.csv
1;Janvier
2;Février
...
> SELECT * FROM periode;
+------------+--------------------------+
| id_periode | libelle_periode |
+------------+--------------------------+
| 1 | Janvier |
| 2 | Février |
| ... | |
+------------+--------------------------+
Didn't find anything about that with a google search :-( Is it a bug ?
TIA,
Regards,
JAY
How to repeat:
Just create a test table, a test CSV file with an accent in the filename, fire up a client and try a 'LOAD DATA LOCAL INFILE "accentuated_filename_here" ....'.
# Table definition
CREATE TABLE `periode` (
`code_periode` varchar(4) default NULL,
`libelle_periode` varchar(50) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
# Sample file (Période.csv)
11;Janvier
12;Février
13;Mars
21;Avril
# Sample query
LOAD DATA LOCAL INFILE 'Période.csv'
REPLACE INTO TABLE periode
FIELDS TERMINATED BY ';'
ESCAPED BY '\\\\'
LINES TERMINATED BY '\\r\\n'
Suggested fix:
In sql_load.cpp around line 171:
#ifndef EMBEDDED_LIBRARY
if (read_file_from_client)
{
(void)net_request_file(&thd->net,ex->file_name);
file = -1;
}
else
#endif
Maybe ex->file_name should be converted back using thd->charset() before being send to client ?
Didn't have time to investigate for 'SELECT INTO OUTFILE ...' ;-(.