Bug #24299 Identifiers in foreign keys cannot contain U+0160, U+0360, ..., U+FF60
Submitted: 14 Nov 2006 14:33 Modified: 22 Feb 2007 19:49
Reporter: David Balažic Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:4.1, 5.0.27 OS:Windows (Windows XP pro SP2)
Assigned to: Marko Mäkelä CPU Architecture:Any

[14 Nov 2006 14:33] David Balažic
Description:
(copied from forum entry http://forums.mysql.com/read.php?22,125323 )

I noticed this problem :

alter table table1 add constraint FKB48269E285EFD5D4 foreign key (f_id) references BAR (id);

If the name BAR contains non-ascii uppercase characters, then the command fails with errno 150.

Example :

create table one (mid bigint not null, primary key (mid));
create table twoš (id bigint not null auto_increment, stevilka varchar(255), primary key (id));

alter table one add index FKB48269E285EFD5D4 (mid),
add constraint FKB48269E285EFD5D4 foreign key (mid) references twoŠ (id);

This fails, while "... references TWoš (id);" would work.

So it converts ASCII chars to lowercase correctly, while non-ascii chars are not converted.

Note that in other commands both ascii and non-ascii are correctly converterd like :
- create table xŠy ... creates table with name "xšy" (all 3 lower case)
- alter table abcŠ ... alters the table "abcš" with no error

I observed this problem with :
- 4.1.21-community-nt
- 5.0.27 (community, win32)

5.1.12-beta win32 does NOT have this problem any more.

My settings (install procedure):
- install MySQL-5.1/mysql-essential-5.1.12-beta-win32.msi
- Typical install
- detailed configuration
- developer machine
- transactional Database Only (e.g. InnoDB is default)
- DSS/OLAP
- Enable TCP/IP ; Enable Strict Mode
- Best support for Multilingualism 

NOTE: Try the examples with an Unicode aware UI, otherwise it might choke on the non-ascii chars. I used MySQL Query Browser 5.0 on Windows XP.

How to repeat:
See above
[14 Nov 2006 15:05] Heikki Tuuri
Marko,

please look at this. I wonder whether we will fix this in 5.0. In 5.1 this works.

Regards,

Heikki
[15 Nov 2006 10:53] Marko Mäkelä
I cannot repeat with the following settings on GNU/Linux:

mysqld --lower_case_table_names=1

set character_set_results=utf8;
set character_set_connection=utf8;
set character_set_client=utf8;

I can see proper UTF-8 names in the data directory, twoš.ibd and twoš.frm.

The name is passed to dict_scan_id() as {0x74, 0x77, 0x6f, 0xc5, 0xa0} (twoŠ, with capital Š).

If lower_case_table_names=1 or the platform is Windows, the name will be converted to lower case by innobase_casedn_str(), which invokes my_casedn_str(). 

Without lower_case_table_names, the ALTER TABLE statement would fail, no matter if the table name is passed as twoŠ or Twoš.

Could it be that you have the wrong character set settings on the client? What does

show variables like '%char%';

display?
[15 Nov 2006 11:01] Marko Mäkelä
Sorry, I should have mentioned that I failed to repeat on 5.0-bk. But the code looks similar in 4.1.
[15 Nov 2006 11:15] Marko Mäkelä
I am not a Windows expert, but I believe that NTFS internally uses UTF-16 encoding for the file names. As far as I know, there are at least three different file name APIs:

* a 8-bit one for MS-DOS compatibility, with IBM Code Page 437 or similar
* a 8-bit one for Windows compatibility, Windows-1252 ("latin1" in MySQL)
* a Unicode API

FAT file systems use the MS-DOS character set, but I do not know about VFAT.

I do not know which one MySQL (or InnoDB with innodb_file_per_table=1) is using. If the problem is with the file name API, I fear that we cannot fix this. Similar problems are possible with Mac OS X, which transforms Unicode file names to a canonical form, i.e., ä (U+00E4 LATIN SMALL LETTER A WITH DIAERESIS) becomes something like ‍̈a (U+0308 COMBINING DIAERESIS followed by the character a).

Unix-like systems traditionally allow any binary data in a file name, except the bytes 0x2f ('/') and 0x00 (NUL). MySQL 5.1 uses a special file name encoding that is a subset of ASCII. That is a probable reason why the problem does not exist there.
[15 Nov 2006 11:20] David Balažic
mysql> show variables like '%char%';
+--------------------------+---------------------------------------------------------+
| Variable_name            | Value                                                   |
+--------------------------+---------------------------------------------------------+
| character_set_client     | utf8                                                    |
| character_set_connection | utf8                                                    |
| character_set_database   | utf8                                                    |
| character_set_filesystem | binary                                                  |
| character_set_results    | utf8                                                    |
| character_set_server     | utf8                                                    |
| character_set_system     | utf8                                                    |
| character_sets_dir       | C:\Program Files\MySQL\MySQL Server 5.1\share\charsets\ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)

mysql>

I use NTFS. VFAT supports Unicode. (just like Joliet)
[15 Nov 2006 11:38] Marko Mäkelä
I tried the following:

set character_set_system=utf8;
set character_set_database=utf8;

but it made no difference: innobase_casedn_str() will convert the Š to lower case.

Have you tried with the following my.cnf settings:

[mysqld]
lower_case_table_names=1
innodb_file_per_table=0
[15 Nov 2006 12:45] David Balažic
You mean my.ini ? ;-)

I tried and got the error :

Can't create table '.\foobar\#sql-8d4_8.frm' (errno: 150)  ErrorNr. 1005
[15 Nov 2006 12:52] David Balažic
With the original settings the error is :

Can't create table '.\ba2\#sql-714_d.frm' (errno: 150)

The file name in both cases is twoš.frm , which I guess what the "binary" settings causes. IMHO it should use native Unicode filenames if the OS supports it.
[15 Nov 2006 13:06] Marko Mäkelä
What does
SHOW ENGINE INNODB STATUS\G
report as the LATEST FOREIGN KEY ERROR immediately after the failing ALTER TABLE?

I'm reassigning this to Sunny Bains, since he has access to a Windows development environment.
[15 Nov 2006 13:46] Heikki Tuuri
Marko,

David said that this works in MySQL-5.1. Do you see any difference in relevant code between 5.0 and 5.1?

Regards,

Heikki
[15 Nov 2006 13:46] David Balažic
------------------------
LATEST FOREIGN KEY ERROR
------------------------
061115 14:40:43 Error in foreign key constraint of table foobar/#sql-f38_4:
 foreign key (mid) references twoŠ (id):
Cannot resolve table name close to:
á (id)

note the console might have corrupted the non-ascii chars (stupid MS console can't handle unicode)
[15 Nov 2006 14:29] Marko Mäkelä
Could the client be passing the string in wrong encoding? Can you try the following:

mysql> select hex('twoŠ'),hex('twoš')\G
*************************** 1. row ***************************
hex('twoŠ'): 74776FC5A0
hex('twoš'): 74776FC5A1

The C5 A0 and C5 A1 at the end of the hex strings are the UTF-8 encoding of U+0160 (Š) and U+0161 (š), respectively.

I checked with GNU Recode that neither character is available in Windows-1252, which MySQL calls latin1.
[15 Nov 2006 14:36] David Balažic
I did it in the Query browser:

select hex('twoŠ'),hex('twoš');

'74776FC5A0', '74776FC5A1'
[15 Nov 2006 20:40] Marko Mäkelä
David,
does the MySQL Query Browser show the table name "twoš" correctly?
Can you run the CREATE TABLE and ALTER TABLE statements in the MySQL Query Browser? It might be the case that the character set encoding is misconfigured in the other client(s) you were using. Most character sets are supersets of ASCII; that might be the reason why names consisting of ASCII characters work.
[16 Nov 2006 9:04] David Balažic
> does the MySQL Query Browser show the table name "twoš" correctly?
yes

>Can you run the CREATE TABLE and ALTER TABLE statements in the MySQL Query
Browser?

Yes, I run them in QB all the time.

>It might be the case that the character set encoding is misconfigured
>in the other client(s) you were using.

There are no other clients. (OK, there are, I used hibernate hdm2dll from hibernate-tools, it has the same problem, that is how I discovered the problem)
[16 Nov 2006 21:00] Marko Mäkelä
David,

Sorry, this is a long note, but I believe I may have found the bug. First, some background and reasoning.

You have mentioned the following wrong encodings. (As a side note, I have seen this report both in HTML and in email in proper utf-8 encoding.)

'Š' in the SHOW ENGINE INNODB STATUS output
'Å¡' in the file system

Let's have GNU Recode translate them and the terminal emulator interpret the result in UTF-8:

echo 'Š'|recode utf8..cp437
Š
echo 'Å¡'|recode utf8..cp1252
š

The 'Å¡' in the file system is definitely a MySQL bug, but I don't know if it can be changed without breaking compatibility. It has been already fixed in 5.1 with the filename-safe encoding.

However, it just occurred to me that the SHOW ENGINE INNODB STATUS output you posted actually split the Š in twoŠ to two bytes, C5 A0. In IBM Code Page 437 of the MS-DOS console, these bytes look like '┼á'. The error message said:

"Cannot resolve table name close to:
á (id)"

My guess is that isspace(0xa0) == TRUE on Win32 (because 0xa0 is the encoding for U+00a0 NON-BREAKABLE SPACE in Windows-1252).  In MySQL 4.1 and 5.0, the InnoDB foreign key parser translates the SQL statement to UTF8 and uses the isspace() from the C library.  In MySQL 5.1, the string is not translated, but instead InnoDB uses my_isspace() and friends from the MySQL character set library.

As a side note, there was a bug in MySQL 4.1 and 5.0 that when the connection character set is latin1 and the SQL string contains 0xa0, InnoDB won't detect it as a space.  The my_isspace() of the default latin1 collation returns TRUE for 0xa0.

Hmm, could we try with some other character whose UTF-8 encoding ends in 0xa0?
Š is U+0160, and I guess anything 0x0160+n*0x0200 would do:

perl -e 'for($i=0x160;$i<0xffff;$i+=0x200){print pack("n",$i)}'|recode ucs2..utf8

I couldn't see other Latin letters than Š in the output. Since I suspect that the bug occurs before the conversion to lower case, you could pick one of the non-Latin scripts as well. The three successive European characters in the output could be easiest to type if copying from this bug report fails:

ᵠ MODIFIER LETTER SMALL GREEK PHI (U+1D60)
ὠ GREEK SMALL LETTER OMEGA WITH PSILI (U+1F60)
Ⅰ ROMAN NUMERAL ONE (U+2160)

Of course, the test case would now be different, e.g.,

set character_set_server=utf8;
set character_set_results=utf8;
set character_set_client=utf8;
set character_set_connection=utf8;

CREATE TABLE oneⅠ(a int primary key)engine=innodb;
CREATE TABLE two(a int primary key,constraint foreign key (a) references oneⅠ(a)) engine=innodb;

Note that the Ⅰ in oneⅠ is the ROMAN NUMERAL ONE (U+2160).
You could also try replacing oneⅠ with TWOŠ (capital Š) in both statements:

CREATE TABLE TWOŠ(a int primary key)engine=innodb;
CREATE TABLE three(a int primary key,constraint c foreign key (a) references TWOŠ(a)) engine=innodb;

All these happen to work on my system, but that is because isspace(0xa0)==0 on my system, both with LC_CTYPE=fi_FI.utf8 and with LC_CTYPE=fi_FI.iso-8859-1. You could try this test program:

#include <stdio.h>
#include <ctype.h>
int main (void) { printf("isspace(0xa0)=%d\n", isspace(0xa0)); return 0; }

Please report the results of the SQL statements and also of running the C program, if possible. If my theory is correct, the fix would be to backport some code from 5.1 to 4.1 and 5.0.
[17 Nov 2006 9:48] David Balažic
CREATE TABLE two(a int primary key,constraint foreign key (a) references
oneⅠ(a)) engine=innodb;

Fails with ErrorNr 1005 / Can't create table '.\xen\two.frm' (errno: 150)

CREATE TABLE three(a int primary key,constraint c foreign key (a) references
TWOŠ(a)) engine=innodb;

Fails with ErrorNr 1005 / Can't create table '.\xen\three.frm' (errno: 150)

The C program prints :
isspace(0xa0)=0

(compiled with Microsoft Visual Studio .NET 2003 / Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86)
[17 Nov 2006 9:52] David Balažic
Note that the C program might not be Unicode aware and behaves differently because of that.
[17 Nov 2006 10:32] Marko Mäkelä
David,
thank you! Can you please post the LATEST FOREIGN KEY ERROR from SHOW ENGINE INNODB STATUS after the failed CREATE TABLE statements?

When it comes to the C program, isspace() and other primitives defined in <ctype.h> cannot possibly return TRUE for any UTF-8 bytes whose high bit is set, if you think of it.  For multibyte characters, there is iswspace() in <wctype.h>.  It could still be that isspace(0xa0) returns TRUE in InnoDB.  You may have used a different compiler, runtime library, or compilation options.
[17 Nov 2006 11:41] David Balažic
I also tried this :

#define _UNICODE
#include <tchar.h>
#include <stdio.h>
#include <ctype.h>
int wmain (void) {
    _tprintf(L"isspace(0xa0)=%d\n",isspace(0xa0));
    return 0;
}
It also prints :  isspace(0xa0)=0

After the "CREATE TABLE two(a int primary key,constraint foreign key (a) references oneⅠ(a)) engine=innodb;"

the error is :

------------------------
LATEST FOREIGN KEY ERROR
------------------------
061117 10:44:02 Error in foreign key constraint of table xen/two:
foreign key (a) references
oneΓàá(a)) engine=innodb:
Cannot resolve table name close to:
á(a)) engine=innodb
[17 Nov 2006 12:46] Marko Mäkelä
I can repeat by replacing all occurrences of isspace() in dict0dict.c with a wrapper that returns TRUE also for 0xa0.

Simple fix: replace isspace(c) with a wrapper that does strchr(" \v\f\t\r\n", c). I will shortly attach a patch.
[17 Nov 2006 12:48] Marko Mäkelä
Patch for 4.1 and 5.0

Attachment: bug24299-5.0.patch (text/x-patch), 2.63 KiB.

[16 Jan 2007 20:32] Marko Mäkelä
The fix of this bug caused Bug #25596.
[24 Jan 2007 0:39] Paul DuBois
Noted in 5.0.32, 5.1.15 changelogs.

Foreign key identifiers for InnoDB tables could not contain certain
characters.

Resetting bug report to Patch Queued pending fix for 4.1 tree.
[25 Jan 2007 7:42] Marko Mäkelä
Jörg, Paul,

this bug does not exist in MySQL 5.1, but only in 4.1 and 5.0.
[14 Feb 2007 14:59] Chad MILLER
Available in 4.1.23, 5.0.36, and 5.1.16-beta.
[22 Feb 2007 19:49] Paul DuBois
Removed changelog entry for 5.1.15.
Added changelog entry for 4.1.23.