Bug #43273 one more unicode/file system problem
Submitted: 27 Feb 2009 23:02 Modified: 12 Dec 2012 12:41
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.1.31 (any) OS:Windows (vista 32 bit)
Assigned to: CPU Architecture:Any

[27 Feb 2009 23:02] Peter Laursen
Description:
This report is a 'spinoff' of this one:
http://bugs.mysql.com/bug.php?id=43184

I only add this one to illustrate that MySQL software's do not understand Windows unicode implementation.  It seems always to expect Windows to return file paths in ANSI (-codepage used by the LOCALE setting).

But that is not true!  Windows will dynamically change between ANSI and Unicode. Windows will use the LOCALE-defined ANSI-codepage *as long* as it is possible but also *only as long*.  If *not possible* it will return as unicode (what in Windows means 'little endian' UTF16).

When it is not possible for Windows to use ANSI the reasons could be
* characters 'outside' codepage for current LOCALE is used (like using cyrillics on a western system)
* characters belong to a language where no ANSI codepage exists (true for all (more than 200) Indian languages).

I have seen incorrect replies about this here at least 5 times by MySQL developers!  I posted corrections a few times, but it was ignored!

How to repeat:
Start server with the option 

datadir="C:/维基百科关于中文维基百科"

... in my.ini. Of course ensure that a valid MySQL database is there (I copied an existing and functional data folder). Windows service manager returns 'could not start service'. Error log has no information of course as the file cannot be accessed!

(This is of course not as serious as the first report - it is only an illustration!)

Suggested fix:
[27 Feb 2009 23:03] Peter Laursen
Server cannot access datadir and fails to start

Attachment: unicode.jpg (image/jpeg, text), 91.51 KiB.

[28 Feb 2009 17:52] Sveta Smirnova
Thank you for the report.

What is the encoding of your configuration file?
[28 Feb 2009 19:11] Peter Laursen
The original configuration file was ANSI/western encoded.  After (in Notepad) replacing the default /datadir file path (c:/program data etc) with 'datadir="C:/维基百科关于中文维基百科"' Notepad will require saving as unicode (as ANSI obviously cannot be used after the edit).  I selected (Windows default) 'unicode' (whether that should be understood as UCS2 or 'little endian' UTF8 is not important here!).  

But I do not understand the relevance of the question!  The configuration file is not in that folder/file path we are discussing!

Service cannot be started!  File path cannot be resolved by the MySQL server! Would it make difference to select 'UTF8' or 'big endian unicode'? for configuration file encoding?
[28 Feb 2009 19:47] Peter Laursen
I meant of course UTF16 and not UTF8!

(whether that should be
understood as UCS2 or 'little endian' UTF16 is not important here!).
[2 Mar 2009 13:59] Vladislav Vaintroub
Like the parent problem the problem is that MySQL cannot handle file names outside of current ANSI codepage (due to use of ANSI windows or CRT APIs)

This is actually more complicated, as my.ini is also assumed to be encoded as ANSI.  UTF16 with or without BOM cannot be handled correctly (neither UTF8 unless it is complete ASCII)
[2 Mar 2009 15:33] Peter Laursen
I thought like that.

Now this is not important either.  It was posted as an illustration only. On the server the Server Admin is in control and he will not do such crazy things.

Similar issues with clients are much more important. The Server Admin does not necessarily control client machines and softwares.
[11 Mar 2009 7:38] Sveta Smirnova
Thank you for the feedback.

Verified as described.
[25 Mar 2009 21:24] Sveta Smirnova
Please see also bug #39449 starting with comment "[25 Mar 21:44] Albert Rosenfield"
[2 Apr 2009 0:29] Paul DuBois
I've added a note to http://dev.mysql.com/doc/refman/5.1/en/windows-vs-unix.html:

Directory and file names

On Windows, MySQL Server supports only directory and file names that are compatible with the current ANSI code pages. For example, the following Japanese directory name will not work in the Western locale (code page 1252):

datadir="C:/维基百科关于中文维基百科"
The same limitation applies to directory and file names referred to in SQL statements, such as the data file path name in LOAD DATA INFILE.

See also Bug#39449 and Bug#43184.
[8 Apr 2009 7:11] Alexander Barkov
This is a duplicate for Bug#43184.
[12 Dec 2012 12:41] Peter Laursen
@Barkov!

They are NOT duplicates, I think.  This report is about the server. The other one is about clients (LOAD DATA *LOCAL*).