Bug #24299 Identifiers in foreign keys cannot contain U+0160, U+0360, ..., U+FF60
Submitted: 14 Nov 2006 15:33 Modified: 22 Feb 2007 20:49
Reporter: David Balažic
Status: Closed
Category:Server: InnoDB Severity:S3 (Non-critical)
Version:4.1, 5.0.27 OS:Microsoft Windows (Windows XP pro SP2)
Assigned to: Marko Mäkelä Target Version:

[14 Nov 2006 15:33] David Balažic
Description:
(copied from forum entry http://forums.mysql.com/read.php?22,125323 )

I noticed this problem :

alter table table1 add constraint FKB48269E285EFD5D4 foreign key (f_id) references BAR
(id);

If the name BAR contains non-ascii uppercase characters, then the command fails with errno
150.

Example :

create table one (mid bigint not null, primary key (mid));
create table twoš (id bigint not null auto_increment, stevilka varchar(255), primary key
(id));

alter table one add index FKB48269E285EFD5D4 (mid),
add constraint FKB48269E285EFD5D4 foreign key (mid) references twoŠ (id);

This fails, while "... references TWoš (id);" would work.

So it converts ASCII chars to lowercase correctly, while non-ascii chars are not
converted.

Note that in other commands both ascii and non-ascii are correctly converterd like :
- create table xŠy ... creates table with name "xšy" (all 3 lower case)
- alter table abcŠ ... alters the table "abcš" with no error

I observed this problem with :
- 4.1.21-community-nt
- 5.0.27 (community, win32)

5.1.12-beta win32 does NOT have this problem any more.

My settings (install procedure):
- install MySQL-5.1/mysql-essential-5.1.12-beta-win32.msi
- Typical install
- detailed configuration
- developer machine
- transactional Database Only (e.g. InnoDB is default)
- DSS/OLAP
- Enable TCP/IP ; Enable Strict Mode
- Best support for Multilingualism 

NOTE: Try the examples with an Unicode aware UI, otherwise it might choke on the non-ascii
chars. I used MySQL Query Browser 5.0 on Windows XP.

How to repeat:
See above
[14 Nov 2006 16:05] Heikki Tuuri
Marko,

please look at this. I wonder whether we will fix this in 5.0. In 5.1 this works.

Regards,

Heikki
[15 Nov 2006 11:53] Marko Mäkelä
I cannot repeat with the following settings on GNU/Linux:

mysqld --lower_case_table_names=1

set character_set_results=utf8;
set character_set_connection=utf8;
set character_set_client=utf8;

I can see proper UTF-8 names in the data directory, twoš.ibd and twoš.frm.

The name is passed to dict_scan_id() as {0x74, 0x77, 0x6f, 0xc5, 0xa0} (twoŠ, with
capital Š).

If lower_case_table_names=1 or the platform is Windows, the name will be converted to
lower case by innobase_casedn_str(), which invokes my_casedn_str(). 

Without lower_case_table_names, the ALTER TABLE statement would fail, no matter if the
table name is passed as twoŠ or Twoš.

Could it be that you have the wrong character set settings on the client? What does

show variables like '%char%';

display?
[15 Nov 2006 12:01] Marko Mäkelä
Sorry, I should have mentioned that I failed to repeat on 5.0-bk. But the code looks
similar in 4.1.
[15 Nov 2006 12:15] Marko Mäkelä
I am not a Windows expert, but I believe that NTFS internally uses UTF-16 encoding for the
file names. As far as I know, there are at least three different file name APIs:

* a 8-bit one for MS-DOS compatibility, with IBM Code Page 437 or similar
* a 8-bit one for Windows compatibility, Windows-1252 ("latin1" in MySQL)
* a Unicode API

FAT file systems use the MS-DOS character set, but I do not know about VFAT.

I do not know which one MySQL (or InnoDB with innodb_file_per_table=1) is using. If the
problem is with the file name API, I fear that we cannot fix this. Similar problems are
possible with Mac OS X, which transforms Unicode file names to a canonical form, i.e., ä
(U+00E4 LATIN SMALL LETTER A WITH DIAERESIS) becomes something like ‍̈a (U+0308
COMBINING DIAERESIS followed by the character a).

Unix-like systems traditionally allow any binary data in a file name, except the bytes
0x2f ('/') and 0x00 (NUL). MySQL 5.1 uses a special file name encoding that is a subset of
ASCII. That is a probable reason why the problem does not exist there.
[15 Nov 2006 12:20] David Balažic
mysql> show variables like '%char%';
+--------------------------+---------------------------------------------------------+
| Variable_name            | Value                                                   |
+--------------------------+---------------------------------------------------------+
| character_set_client     | utf8                                                    |
| character_set_connection | utf8                                                    |
| character_set_database   | utf8                                                    |
| character_set_filesystem | binary                                                  |
| character_set_results    | utf8                                                    |
| character_set_server     | utf8                                                    |
| character_set_system     | utf8                                                    |
| character_sets_dir       | C:\Program Files\MySQL\MySQL Server 5.1\share\charsets\ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)

mysql>

I use NTFS. VFAT supports Unicode. (just like Joliet)
[15 Nov 2006 12:38] Marko Mäkelä
I tried the following:

set character_set_system=utf8;
set character_set_database=utf8;

but it made no difference: innobase_casedn_str() will convert the Š to lower case.

Have you tried with the following my.cnf settings:

[mysqld]
lower_case_table_names=1
innodb_file_per_table=0
[15 Nov 2006 13:45] David Balažic
You mean my.ini ? ;-)

I tried and got the error :

Can't create table '.\foobar\#sql-8d4_8.frm' (errno: 150)  ErrorNr. 1005
[15 Nov 2006 13:52] David Balažic
With the original settings the error is :

Can't create table '.\ba2\#sql-714_d.frm' (errno: 150)

The file name in both cases is twoš.frm , which I guess what the "binary" settings
causes. IMHO it should use native Unicode filenames if the OS supports it.
[15 Nov 2006 14:06] Marko Mäkelä
What does
SHOW ENGINE INNODB STATUS\G
report as the LATEST FOREIGN KEY ERROR immediately after the failing ALTER TABLE?

I'm reassigning this to Sunny Bains, since he has access to a Windows development
environment.
[15 Nov 2006 14:46] Heikki Tuuri
Marko,

David said that this works in MySQL-5.1. Do you see any difference in relevant code
between 5.0 and 5.1?

Regards,

Heikki
[15 Nov 2006 14:46] David Balažic
------------------------
LATEST FOREIGN KEY ERROR
------------------------
061115 14:40:43 Error in foreign key constraint of table foobar/#sql-f38_4:
 foreign key (mid) references twoŠ (id):
Cannot resolve table name close to:
á (id)

note the console might have corrupted the non-ascii chars (stupid MS console can't handle
unicode)
[15 Nov 2006 15:29] Marko Mäkelä
Could the client be passing the string in wrong encoding? Can you try the following:

mysql> select hex('twoŠ'),hex('twoš')\G
*************************** 1. row ***************************
hex('twoŠ'): 74776FC5A0
hex('twoš'): 74776FC5A1

The C5 A0 and C5 A1 at the end of the hex strings are the UTF-8 encoding of U+0160 (Š)
and U+0161 (š), respectively.

I checked with GNU Recode that neither character is available in Windows-1252, which MySQL
calls latin1.
[15 Nov 2006 15:36] David Balažic
I did it in the Query browser:

select hex('twoŠ'),hex('twoš');

'74776FC5A0', '74776FC5A1'
[15 Nov 2006 21:40] Marko Mäkelä
David,
does the MySQL Query Browser show the table name "twoš" correctly?
Can you run the CREATE TABLE and ALTER TABLE statements in the MySQL Query Browser? It
might be the case that the character set encoding is misconfigured in the other client(s)
you were using. Most character sets are supersets of ASCII; that might be the reason why
names consisting of ASCII characters work.
[16 Nov 2006 10:04] David Balažic
> does the MySQL Query Browser show the table name "twoš" correctly?
yes

>Can you run the CREATE TABLE and ALTER TABLE statements in the MySQL Query
Browser?

Yes, I run them in QB all the time.

>It might be the case that the character set encoding is misconfigured
>in the other client(s) you were using.

There are no other clients. (OK, there are, I used hibernate hdm2dll from hibernate-tools,
it has the same problem, that is how I discovered the problem)
[16 Nov 2006 22:00] Marko Mäkelä
David,

Sorry, this is a long note, but I believe I may have found the bug. First, some background
and reasoning.

You have mentioned the following wrong encodings. (As a side note, I have seen this report
both in HTML and in email in proper utf-8 encoding.)

'Š' in the SHOW ENGINE INNODB STATUS output
'Å¡' in the file system

Let's have GNU Recode translate them and the terminal emulator interpret the result in
UTF-8:

echo 'Š'|recode utf8..cp437
Š
echo 'Å¡'|recode utf8..cp1252
š

The 'Å¡' in the file system is definitely a MySQL bug, but I don't know if it can be
changed without breaking compatibility. It has been already fixed in 5.1 with the
filename-safe encoding.

However, it just occurred to me that the SHOW ENGINE INNODB STATUS output you posted
actually split the Š in twoŠ to two bytes, C5 A0. In IBM Code Page 437 of the MS-DOS
console, these bytes look like 'Š'. The error message said:

"Cannot resolve table name close to:
á (id)"

My guess is that isspace(0xa0) == TRUE on Win32 (because 0xa0 is the encoding for U+00a0
NON-BREAKABLE SPACE in Windows-1252).  In MySQL 4.1 and 5.0, the InnoDB foreign key parser
translates the SQL statement to UTF8 and uses the isspace() from the C library.  In MySQL
5.1, the string is not translated, but instead InnoDB uses my_isspace() and friends from
the MySQL character set library.

As a side note, there was a bug in MySQL 4.1 and 5.0 that when the connection character
set is latin1 and the SQL string contains 0xa0, InnoDB won't detect it as a space.  The
my_isspace() of the default latin1 collation returns TRUE for 0xa0.

Hmm, could we try with some other character whose UTF-8 encoding ends in 0xa0?
Š is U+0160, and I guess anything 0x0160+n*0x0200 would do:

perl -e 'for($i=0x160;$i<0xffff;$i+=0x200){print pack("n",$i)}'|recode ucs2..utf8

I couldn't see other Latin letters than Š in the output. Since I suspect that the bug
occurs before the conversion to lower case, you could pick one of the non-Latin scripts as
well. The three successive European characters in the output could be easiest to type if
copying from this bug report fails:

ᵠ MODIFIER LETTER SMALL GREEK PHI (U+1D60)
ὠ GREEK SMALL LETTER OMEGA WITH PSILI (U+1F60)
Ⅰ ROMAN NUMERAL ONE (U+2160)

Of course, the test case would now be different, e.g.,

set character_set_server=utf8;
set character_set_results=utf8;
set character_set_client=utf8;
set character_set_connection=utf8;

CREATE TABLE oneⅠ(a int primary key)engine=innodb;
CREATE TABLE two(a int primary key,constraint foreign key (a) references oneⅠ(a))
engine=innodb;

Note that the Ⅰ in oneⅠ is the ROMAN NUMERAL ONE (U+2160).
You could also try replacing oneⅠ with TWOŠ (capital Š) in both statements:

CREATE TABLE TWOŠ(a int primary key)engine=innodb;
CREATE TABLE three(a int primary key,constraint c foreign key (a) references TWOŠ(a))
engine=innodb;

All these happen to work on my system, but that is because isspace(0xa0)==0 on my system,
both with LC_CTYPE=fi_FI.utf8 and with LC_CTYPE=fi_FI.iso-8859-1. You could try this test
program:

#include <stdio.h>
#include <ctype.h>
int main (void) { printf("isspace(0xa0)=%d\n", isspace(0xa0)); return 0; }

Please report the results of the SQL statements and also of running the C program, if
possible. If my theory is correct, the fix would be to backport some code from 5.1 to 4.1
and 5.0.
[17 Nov 2006 10:48] David Balažic
CREATE TABLE two(a int primary key,constraint foreign key (a) references
oneⅠ(a)) engine=innodb;

Fails with ErrorNr 1005 / Can't create table '.\xen\two.frm' (errno: 150)

CREATE TABLE three(a int primary key,constraint c foreign key (a) references
TWOŠ(a)) engine=innodb;

Fails with ErrorNr 1005 / Can't create table '.\xen\three.frm' (errno: 150)

The C program prints :
isspace(0xa0)=0

(compiled with Microsoft Visual Studio .NET 2003 / Microsoft (R) 32-bit C/C++ Optimizing
Compiler Version 13.10.3077 for 80x86)
[17 Nov 2006 10:52] David Balažic
Note that the C program might not be Unicode aware and behaves differently because of
that.
[17 Nov 2006 11:32] Marko Mäkelä
David,
thank you! Can you please post the LATEST FOREIGN KEY ERROR from SHOW ENGINE INNODB STATUS
after the failed CREATE TABLE statements?

When it comes to the C program, isspace() and other primitives defined in <ctype.h> cannot
possibly return TRUE for any UTF-8 bytes whose high bit is set, if you think of it.  For
multibyte characters, there is iswspace() in <wctype.h>.  It could still be that
isspace(0xa0) returns TRUE in InnoDB.  You may have used a different compiler, runtime
library, or compilation options.
[17 Nov 2006 12:41] David Balažic
I also tried this :

#define _UNICODE
#include <tchar.h>
#include <stdio.h>
#include <ctype.h>
int wmain (void) {
    _tprintf(L"isspace(0xa0)=%d\n",isspace(0xa0));
    return 0;
}
It also prints :  isspace(0xa0)=0

After the "CREATE TABLE two(a int primary key,constraint foreign key (a) references
oneⅠ(a)) engine=innodb;"

the error is :

------------------------
LATEST FOREIGN KEY ERROR
------------------------
061117 10:44:02 Error in foreign key constraint of table xen/two:
foreign key (a) references
oneΓàá(a)) engine=innodb:
Cannot resolve table name close to:
á(a)) engine=innodb
[17 Nov 2006 13:46] Marko Mäkelä
I can repeat by replacing all occurrences of isspace() in dict0dict.c with a wrapper that
returns TRUE also for 0xa0.

Simple fix: replace isspace(c) with a wrapper that does strchr(" \v\f\t\r\n", c). I will
shortly attach a patch.
[17 Nov 2006 13:48] Marko Mäkelä
Patch for 4.1 and 5.0

Attachment: bug24299-5.0.patch (text/x-patch), 2.63 KiB.

[16 Jan 2007 21:32] Marko Mäkelä
The fix of this bug caused Bug #25596.
[24 Jan 2007 1:39] Paul DuBois
Noted in 5.0.32, 5.1.15 changelogs.

Foreign key identifiers for InnoDB tables could not contain certain
characters.

Resetting bug report to Patch Queued pending fix for 4.1 tree.
[25 Jan 2007 8:42] Marko Mäkelä
Jörg, Paul,

this bug does not exist in MySQL 5.1, but only in 4.1 and 5.0.
[14 Feb 2007 15:59] Chad MILLER
Available in 4.1.23, 5.0.36, and 5.1.16-beta.
[22 Feb 2007 20:49] Paul DuBois
Removed changelog entry for 5.1.15.
Added changelog entry for 4.1.23.