Bug #31688 Cannot use LOAD DATA (LOCAL) INFILE on files with cyrillic names
Submitted: 18 Oct 2007 9:47 Modified: 16 Jan 2009 20:49
Reporter: Sergey Kudriavtsev Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.37-community-nt OS:Microsoft Windows (XP Prof SP2)
Assigned to: CPU Architecture:Any
Tags: charset, LOAD DATA
Triage: Triaged: D3 (Medium)

[18 Oct 2007 9:47] Sergey Kudriavtsev
Description:
When I try to use LOAD DATA INFILE command (either LOCAL or not) on file with cyrillic characters in it's path, the system returns "File not found" exception. 

The bug is present on NTFS and FAT32 filesystems.

The bug is critical for an application I'm developing and the only workaround I found is to rename files before LOADing them.

How to repeat:
Try to LOAD DATA from file with cyrillic characters in path.

Suggested fix:
None
[18 Oct 2007 19:27] Sveta Smirnova
Thank you for the report.

Verified as described using 5.1.22 binaries.
[18 Oct 2007 20:07] Sergey Kudriavtsev
Any info avaliable on time the fix will come?
BTW, if someone tells me in which source file(s) is this command handled, I could try to fix it by myself.
[18 Oct 2007 21:15] Sergei Golubchik
If you want to look into it, here:

in sql_yacc.yy, filename (in the LOAD DATA) is parsed in this rule:

TEXT_STRING_filesystem:
          TEXT_STRING
          {
            THD *thd= YYTHD;

            if (thd->charset_is_character_set_filesystem)
              $$= $1;
            else
              thd->convert_string(&$$, thd->variables.character_set_filesystem,
                                  $1.str, $1.length, thd->charset());
          }
        ;

as you can see it's converted from character_set_client into character_set_filesystem in the parser.
Then (in the load_data rule) it's saved in lex->exchange.

In sql_load.cc, mysql_load() function it's used as

      (void) fn_format(name, ex->file_name, mysql_real_data_home, "",
		       MY_RELATIVE_PATH | MY_UNPACK_FILENAME);

which copies ex->file_name (ex being lex->exchange) to name, and

    if ((file=my_open(name,O_RDONLY,MYF(MY_WME))) < 0)
      DBUG_RETURN(TRUE);

pretty much straightforward.
[19 Oct 2007 11:33] Sergey Kudriavtsev
Thanks for info, I shall look into it at the evening.
[4 Feb 2008 20:23] Omer Barnir
Workaround: Rename the file name to latin1 characters before running the command
[4 Feb 2008 21:35] Sergey Kudriavtsev
To Omer BarNir:

I have already found this workaround, as I mentioned in bug description :) And I'm using it now. The matter is it's rather incovenient for users of my application...
[4 Dec 2008 11:41] Alexander Barkov
This is not a bug.

You need to set character_set_filesystem to the correct value,
which is I guess cp1251 on your system.
[4 Dec 2008 11:49] Sergey Kudriavtsev
Alexander Barkov:

Please, take a look at my comments on [18 Oct 2007 11:47] and [18 Oct 2007 11:50]. This is not the case.
[9 Dec 2008 9:47] Alexander Barkov
Please send HEX dumps of all LOAD DATA INFILE queries,
or mysql query log.

Thanks!
[9 Dec 2008 9:51] Sergey Kudriavtsev
It's pretty difficult at the moment. The server is long gone and 5.0.37 is not a latest production version anymore. I'll try to reproduce this one more time, but please don't expect immediate results.
[10 Jan 2009 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[16 Jan 2009 15:09] Alexander Barkov
Tested with MySQL-5.0.67-community-nt.

1. Installed MySQL from mysql-5.0.67-win32.zip.
   Selected "cp1251" as default character set.

2. Edited "C:\\Program Files\MySQL\MySQL Server 5.0\my.ini"  as follows:

[mysql]
default-character-set=cp866      # new line
...
[mysqld]
default-character-set=cp1251     # this one should have been added by Setup.exe
character-set-filesystem=cp1251  # new line

3. Restarted MySQL Server:
   Start -> Settings -> Administration -> Services

4. Run either "mysql.exe -uroot"  (of the root password was left empty)
or "mysql.exe -uroot --password=123123" (if the password was changed).

Opps. Error:

mysql: Character set 'cp866' is not a compiled character set and is not specified in the 'C:\mysql\\share\charsets\Index.xml' file

This is a bug: mysql.exe is looking for character set files in a wrong directory:

C:\mysql\\share\charsets\

  instead of

C:\\Program Files\MySQL\MySQL Server 5.0\share\charsets\

Ok. Copy 

C:\\Program Files\MySQL\MySQL Server 5.0\share\charsets\*
to
C:\mysql\\share\charsets\

and start mysql.exe again.

5. Make sure that the desired variables were set properly

mysql> show variables like 'character_set\_%';
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | cp866  | <-- Ok
| character_set_connection | cp866  | <-- Ok
| character_set_database   | cp1251 |
| character_set_filesystem | cp1251 | <-- Ok
| character_set_results    | cp866  | <-- Ok
| character_set_server     | cp1251 |
| character_set_system     | utf8   |
+--------------------------+--------+
7 rows in set (0.00 sec)

6. Run query:

CREATE TABLE t1 (a int);

7. Create a file with Cyrillic letters in file name, using Windows Exprlorer:
C:\Program Files\MySQL 5.0\data\Вася.txt
and put some numbers into it:

Вася.txt:
1
2
3
EOF

8. Load data from the file:

mysql> load data infile 'Вася' into table t1;
Query OK, 3 rows affected (0.01 sec)
Records: 3  Deleted: 0  Skipped: 0  Warnings: 0

9. Make sure it worked fine:

mysql> select * from t1;
+------+
| a    |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set (0.01 sec)

Greetings.
[16 Jan 2009 15:12] Alexander Barkov
The above test demonstates that:
- MySQL treats file names correctly, if character-set-filesystem
  and default-character-set are set properly.
- MySQL client is looking for the character set files in a wrong directory,
which is a bug.
[16 Jan 2009 20:49] Sveta Smirnova
Alexander, thank you for the detailed description.

So this problem is not result of a bug.

Problem with character set directory is posted already: bug #17270