Bug #11137 LOAD DATA INFILE and accentuated filenames
Submitted: 7 Jun 2005 14:17 Modified: 8 Jun 2005 13:32
Reporter: Jay Bertrand Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.11-debian & 5.0.6b-w2k OS:Windows (w2k & debian)
Assigned to: CPU Architecture:Any

[7 Jun 2005 14:17] Jay Bertrand
Description:
Hi,

I'm experiencing a charset problem on accentuated filenames when using
'LOAD DATA INFILE' or 'SELECT INTO OUTFILE' on Mysql 4.1.12.

Here's my configuration:

> SELECT VERSION();
+---------------------+
| version() |
+---------------------+
| 4.1.11-Debian_4-log |
+---------------------+
1 row in set (0.00 sec)

> SHOW VARIABLES LIKE 'character_%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
7 rows in set (0.00 sec)

When using accents in the filename, i get an error like:

> LOAD DATA LOCAL INFILE 'Période.csv' ...
ERROR 2 (HY000): File 'Période.csv' not found (Errcode: 2)

If I rename the original file from 'Période.csv' to 'Période.csv', everything's fine! It seems that the filename is converted into utf8 by mysqld and not converted back into something like "charset_set_connection" when sent back to client. Same problem happens with mysqld 5.0.6b on W2K (but worked fined in 4.0.x).

The same thing happens when using 'SELECT INTO OUTFILE'. For example:

> SELECT * INTO OUTFILE 'Période.csv' FROM ...

% ls
Période.csv

Now, when i change character set to utf8, mysql behaves has expected (really don't understand why ?!?):

> SET character_set_client=utf8;
> SELECT * INTO OUTFILE 'Période.csv' FROM ...

% ls
Période.csv

In any above case, the content of the file is correctly interpreted:

% cat Période.csv
1;Janvier
2;Février
...

> SELECT * FROM periode;
+------------+--------------------------+
| id_periode | libelle_periode |
+------------+--------------------------+
| 1 | Janvier |
| 2 | Février |
| ... | |
+------------+--------------------------+

Didn't find anything about that with a google search :-( Is it a bug ?

TIA,

Regards,
JAY

How to repeat:
Just create a test table, a test CSV file with an accent in the filename, fire up a client and try a 'LOAD DATA LOCAL INFILE "accentuated_filename_here" ....'.

# Table definition
CREATE TABLE `periode` (
  `code_periode` varchar(4) default NULL,
  `libelle_periode` varchar(50) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

# Sample file (Période.csv)
11;Janvier
12;Février
13;Mars
21;Avril

# Sample query
LOAD DATA LOCAL INFILE 'Période.csv'
REPLACE INTO TABLE periode 
FIELDS TERMINATED BY ';'  
ESCAPED BY '\\\\'  
LINES TERMINATED BY '\\r\\n'

Suggested fix:
In sql_load.cpp around line 171:

#ifndef EMBEDDED_LIBRARY
  if (read_file_from_client)
  {
    (void)net_request_file(&thd->net,ex->file_name);
    file = -1;
  }
  else
#endif

Maybe ex->file_name should be converted back using thd->charset() before being send to client ?

Didn't have time to investigate for 'SELECT INTO OUTFILE ...' ;-(.
[8 Jun 2005 0:51] MySQL Verification Team
Verified on windows:

ERROR 2 (HY000): File 'c:\PÔÇÜriode.csv' not found (Errcode: 2)
[8 Jun 2005 13:32] Sergei Golubchik
A duplicate of http://bugs.mysql.com/bug.php?id=3906